They often publish "needle in a haystack" benchmarks that look very good, but my...

		bigmadshoe 8 months ago \| parent \| context \| favorite \| on: Claude 4 They often publish "needle in a haystack" benchmarks that look very good, but my subjective experience with a large context is always bad. Maybe we need better benchmarks.