Aha, I think I saw the trick for the live demo: every time they used the "video feed", they did prompt the model specifically by saying:
- "What are you seeing now"
- "I'm showing this to you now"
etc.
The one time where he didn't prime the model to take a snapshot this way, was the time where the model saw the "table" (an old snapshot, since the phone was on the table/pointed at the table), so that might be the reason.
Yeah, the way the app currently works is that ChatGPT-4o only sees up to the moment of your last comment.
For example, I tried asking ChatGPT-4o to commentate a soccer game, but I got pretty bad hallucinations, as the model couldn’t see any new video come in after my instruction.
So when using ChatGPT-4o you’ll have to point the camera first and then ask your question - it won’t work to first ask the question and then point the camera.
(I was able to play with the model early because I work at OpenAI.)
- "What are you seeing now"
- "I'm showing this to you now"
etc.
The one time where he didn't prime the model to take a snapshot this way, was the time where the model saw the "table" (an old snapshot, since the phone was on the table/pointed at the table), so that might be the reason.