Enhancing Smart Factories with Real-Time Scene Understanding for AMRs

  • April 30, 2025
  • news
  • 110

Discover how SmartEdge uses synchronised RGB depth cameras and semantic graph fusion to provide AMRs with a shared, real-time understanding of the environment. This enables robots to avoid obstacles, dynamically reschedule and work together seamlessly – for more flexibility, safety and efficiency in the smart factory.

In modern “Industry 4.0” environments, flexibility and responsiveness are as crucial as efficiency. Traditional automated factories often rely on fixed layouts and predefined paths, but Autonomous Mobile Robots (AMRs) demand more: the ability to perceive, interpret and adapt to dynamic surroundings. Deliverable 5.2 of the SmartEdge Project introduces the manufacturing scene understanding artifact (A5.1.2.2, complemented by A3.11), a solution that empowers AMRs and other smart devices to build a shared, real-time model of their operational environment and collaborate seamlessly.

Key Challenges in Manufacturing vs. Traffic Scenarios

Although traffic and manufacturing both use RGB-depth cameras and stream semantic graphs, their environments differ fundamentally:

  • Manufacturing is indoors, under stable lighting, with occasional smoke or steam, whereas traffic is outdoors with variable lighting and weather.
  • Factory objects may move unpredictably (e.g., Mecanum-wheeled robots crabbing sideways) and often crowd or occlude each other for extended periods. Traffic objects tend to follow predictable paths, maintain distances and have fewer classes.
  • The flat factory floor simplifies depth-based foreground detection, while uneven roads complicate it.

These differences lead to distinct assumptions and specialized processing pipelines in the manufacturing artifact.

Solution Overview

At its core, the artifact couples RGB images and depth maps from ceiling-mounted (or potentially onboard) cameras with a sequence of sensor-specific processing pipelines. Outputs are RDF-based scene understanding graphs that describe object identities, locations and relationships. By sharing these graphs across an edge-deployed “swarm” of smart-nodes, all devices obtain a richer, multi-view understanding—even beyond their own sensors’ line-of-sight.

Components used for the implementation

  • Sensors & IoT Gateways: Intel L515 LiDAR and D455 stereo-depth cameras capture synchronized RGB-depth streams. Mounted high in the factory, they connect via USB to Dell 5200 IoT gateways, each running the SmartEdge pipelines.
  • Calibration & Identification: Floor- and wall-mounted QR codes (unique “Thing” entries in the Thing Description Directory, TDD) serve dual roles: they help calibrate each camera’s projection matrix and furnish unambiguous object IDs when captured.
  • Swarm Communication: All smart-nodes interconnect over a specialized wireless network (T4.1/A4.1-6) and exchange scene graphs via Zenoh Message-Oriented Middleware (A3.2). The TDD (A3.3) indexes node capabilities, topic paths and fields of view.

Processing Pipelines

Media Stream Processing normalizes RGB and depth feeds, synchronizing frame rates and viewpoints. Foreground Detection exploits the flat factory floor: a baseline depth map is computed once, then subtracted frame-by-frame to isolate both static (e.g., pillars) and moving objects. Object Classification uses a fine-tuned YOLO model (based on COCO plus custom images of robots, racks, trays, conveyors, servers, pallets, etc.) to label items. Uncertain cases are marked “Unknown.” CAD-derived URDF models generate synthetic training images for continuous improvement. Identifying-Mark Detection locates and decodes QR codes via libraries like pyzbar, yielding instance-level IDs for racks, AMRs and other tagged equipment.

Semantic Fusion & Tracking

To maintain object continuity across frames—even through long occlusions—the system adapts DeepSORT for graph‐based prediction. Predicted graphs Gt+1 are reconciled with fresh pipeline outputs, weighted by confidence, to relabel and update object positions. Enrichment from TDD properties (e.g., robot weight, capabilities) further augments each entity’s semantic context.

Applications in Use Case 3: Smart Factories

  • Collaborative Navigation: AMRs subscribe to scene graphs from multiple cameras overlooking their target area. A higher-capacity node fuses these streams into a unified 3D semantic model and derives a 2D occupancy map (via NAV2/ROS2) for robust path planning, obstacle avoidance and dynamic rerouting.
  • Safety & Alerts: Unexpected pallets trigger staff notifications. Person detection halts robots automatically.
  • Adaptability: Instead of rigidly fixed pathways, robots adjust to moved racks, ad-hoc obstacles or layout changes—boosting uptime and throughput.

Conclusion

By integrating calibrated RGB-depth sensing, multi-modal pipelines and semantic graph fusion within an edge-native swarm, the SmartEdge manufacturing scene understanding artifact delivers a powerful foundation for truly autonomous, collaborative and adaptable AMRs. This approach paves the way toward smarter, safer and more flexible factories of the future.

Share this Post