It used to be that the bots had a short context window, and they struggled with ...

beaker52 · 2026-01-12T22:25:59 1768256759

That’s cool. I know context windows are arbitrarily larger now because consumers think that larger window = better, but I think the sentiment that the model can’t even use the window effectively still stands?

I still find LLMs perform best with a potent and focussed context to work with, and performance goes down quite significantly the more context it has.

What’s your experience been?

elliotto · 2026-01-15T02:21:30 1768443690

I worked on a startup experimenting with using gemini-2.0-flash (the year old model) using its full 1m context window to query technical documents. We found it to be extremely successful at needle-in-a-haystack type problems.

As we migrated to newer models (gemini-3.0 and the o4-mini models) we again found it performed even better with x00k tokens. Our system prompt grew to about 20k tokens and the bots were able to handle it perfectly. Our issue became time to first token with large context, rather than the bot quality.

The ultra large 1m+ llama models were reported to be ineffective at >1m context. But at this point, it becomes so cost prohibitive to use anyway.

I am continuing to have success using Cursor's Auto model, and GPT-5.1 with extremely long conversations. I use different chats for different problems moreso for my own compartmentalisation of thoughts, rather than as a necessity for the bot.