Image Input (sticky context)
No image — upload or capture to set visual context
Location (sticky context)
Voice (continuous listening)
Idle
700ms
8
Manual Send
In listening mode, utterances auto-send. Use this for image/location-only sends.
Log
Start listening and speak — context frames will appear as you talk.