CLONE is a whole-body teleoperation system that achieves comprehensive robot control using a VR headset. It enabeles previously unattainable comprehensive skills, such as picking up an object from the ground and placing it in a distant bin, facilitating the collection of long-horizon interaction data and establishes a foundation for more capable human-robot interaction in both research and practical applications.
Existing teleoperation methods face two major challenges: (i) restricted controllability due to decoupled upper- and lower-body control, and (ii) significant global tracking drift resulting from open-loop execution, particularly in long-horizon tasks. These limitations hinder humanoid robots from executing synchronized whole-body motions necessary for long-horizon loco-manipulation tasks.
In this paper, we introduce CLONE, a whole-body teleoperation system that overcomes these challenges!
Model Architecture -- A Mixture-of-Experts (MoE) whole-body control policy that enables complex coordinated movements, such as “picking up an object from the ground” and “placing it in a distant bin”;
System Integration -- A closed-loop error correction mechanism using LiDAR odometry, reducing translational drift to 12cm over 8.9-meter trajectories;
Data Curation -- A systematic data augmentation strategy that ensures robust performance under diverse, previously unseen operator poses.
All videos show real-time teleoperation at 1x speed using a unified policy.
Whole-Body Tracking: Robot tracks various motions with stable, precise performance. Notably, it covers 15 meters while transitioning poses during walking, then returns to the start position with minimal drift.
Long-Horizon Motion Tracking
Upper-Body Motion Tracking
Turning
Side-Stepping
Squatting and Walking
Robust and Accurate Global Position Tracking
Circular Walking
Interactive Tasks: The robot demonstrates smooth, precise interaction capabilities.
Playing Table Tennis
Teleoperating the humanoid to play table tennis using forehand strokes driven by waist movement.
Playing Table Tennis
Teleoperating the humanoid to play table tennis using backhand strokes.
Tabletop Object Manipulation
Tabletop Object Handover
Long-Horizon Interactive Tasks: The humanoid performs precise, long-horizon interactions with closed-loop error correction.
Single-Handed Object Retrieval from the Ground
Dual-Handed Object Retrieval from the Ground
Dual-Handed Pick-and-Place
Dual-Handed General Pick-and-Place
Squatting and Wiping
Standing Wiping

Framework and structure of CLONE. CLONE curates and augments the retargeted AMASS dataset through motion editing to introduce diverse humanoid motions and detailed hand movements. We employ an MoE network as the student policy, distilling it from a teacher policy trained with privileged information. For the real-world deployment, we integrate LiDAR odometry into the system to obtain real-time humanoid states, enabling closed-loop error correction.
We adopt an MoE framework that enables a unified policy to learn diverse motion skills while synthesizing lower-body motions coordinated with upper-body actions.
We incorporate LiDAR odometry and Apple Vision Pro tracking to provide closed-loop global pose feedback, enabling real-time drift correction during teleoperation.
We curate a large-scale dataset CLONED by enhancing a subset of the AMASS dataset with sampled hand orientations and additional motion-captured dataset, ensuring robust generalization to dexterous and dynamic whole-body motions.