“True spatial interaction” in AR glasses means the system can understand real 3D space, track the user’s viewpoint precisely, and then render/allow interaction in a way that feels physically correct—especially when the user needs accurate depth (near vs far), scale, and alignment. For depth-critical workflows, this must work reliably in motion.

1) What makes interaction “spatial”?
A. 6-DoF tracking (pose)
The glasses continuously estimate the user’s head pose:
- Position in 3D space (x, y, z)
- Orientation (roll/pitch/yaw)
Depth-critical workflows depend on low drift and low latency so that virtual objects stay “stuck” to the real world.
B. World understanding (mapping)
The system builds a representation of the environment:
- Feature points / SLAM map
- Planes (walls/floors) and sometimes meshes
- Recognized objects or anchors (e.g., “this specific machine part”)
This provides a coordinate frame so a virtual object can be placed at a specific real location.
C. Anchoring & stability
When you “place” something (a label, tool guide, 3D model), it must remain fixed relative to the real scene even when you move. Depth-critical tasks fail when anchors slide or scale incorrectly.
2) What makes it “depth-critical”?
Depth-critical means errors in depth translate directly into wrong actions, for example:
- Aligning fasteners or parts
- Drilling/cutting path guidance
- Training where correct positioning matters
- Medical/procedural-like guidance (even if not a medical device)
So the system must deliver:
- Accurate relative depth (near vs far)
- Correct scale (1:1 size or known calibration)
- Consistent parallax (stereo cues + correct rendering)
3) How stereo/binocular helps depth-critical workflows
Binocular glasses can render stereoscopic depth:
- Each eye sees a slightly different image, matching real-world parallax
- This improves perceived depth and makes “reach/align” tasks more natural
However, stereo only helps if:
- Optical alignment is correct
- Tracking is good
- Rendering matches the user’s actual viewpoint (latency matters)
4) Interaction methods that use 3D space
To interact “in space,” the system needs a way to target 3D points/objects:
A. Gaze + ray casting (common)
- Eye tracking gives a gaze direction
- The system casts a ray into the reconstructed scene
- It determines the 3D point you’re looking at (for selection, grabbing, “tap in space”)
B. Controller/hand tracking
- Hand/controller pose is tracked in 3D
- The user “grabs” virtual objects or aligns tools
- Constraints help prevent unrealistic interactions (e.g., snapping to edges/axes)
C. Spatial gestures
- Pinch to select
- Grab to move
- Rotate to align
- Confirm/cancel actions with air taps or hand poses
Depth-critical benefit: the “target” is a 3D coordinate, not a 2D screen pixel.
5) The rendering pipeline must be correct
For depth-critical workflows, rendering must be physically consistent:
- Compute gaze/head pose at render time
- Project 3D anchors into the display
- Apply occlusion handling (virtual object hidden behind real objects when appropriate)
- Maintain correct depth ordering (so a virtual tool guide doesn’t “float through” the real tool)
This often requires:
- Depth map estimation (from cameras)
- Occlusion meshes or learned depth
- Accurate camera calibration
6) Latency and prediction (why timing is critical)
If pose updates arrive late:
- The virtual object appears to lag behind
- Stereo depth cues can become uncomfortable
- Alignment tasks become error-prone
So systems use:
- Sensor fusion (IMU + vision)
- Motion prediction (estimate where the user will be at display time)
- Late latching/reprojection (update pose as late as possible before scanout)
7) Typical depth-critical workflow examples (how it plays out)
A. Maintenance/assembly “Place the part here”
- The system detects the assembly area/anchors (or recognizes the part)
- Shows a 3D placement ghost/guide at the correct location
- User aligns screws/parts using gaze + hand/controller targeting
- Occlusion + stereo help confirm “in/out” depth
B. “Follow the drill path.”
- A path is computed in 3D, tied to the real surface
- As you move, the path remains locked to the surface
- Depth accuracy ensures the virtual trajectory corresponds to the physical cut/drill location
C. Training simulation for correct positioning
- Virtual anatomy/tools placed in real-ish space
- Scoring based on 3D deviation tolerances
- Binoculars improve realism, but tracking accuracy is the bigger determinant
8) What could still go wrong (key failure modes)
- SLAM drift → anchors slowly shift, causing depth errors
- Scale miscalibration → virtual objects seem too big/small
- Bad occlusion → depth looks wrong (virtual tool appears in front when it should be behind)
- Stereo/vergence mismatch → eye strain or reduced confidence
- Tracking loss/lighting changes → sudden jumps or inability to anchor