One of the dangers of automated tests is that if you use an LLM to generate test... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		btown 82 days ago \| parent \| context \| favorite \| on: GPT-5.1: A smarter, more conversational ChatGPT One of the dangers of automated tests is that if you use an LLM to generate tests, it can easily start testing implemented rather than desired behavior. Tell it to loop until tests pass, and it will do exactly that if unsupervised. And you can’t even treat implementation as a black box, even using different LLMs, when all the frontier models are trained to have similar biases towards confidence and obsequiousness in making assumptions about the spec! Verifying the solution in agentic coding is not nearly as easy as it sounds.

xmcqdpt2 82 days ago [–]

Not only can it easily do this, I've found that Claude models do this as a matter of course. My strategy now has been to either write the test or write the implementation and use Claude for the other one. That keeps it a lot more honest.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact