It is primarily a principal agent problem, with a hint of marshmallow test.
If you are a developer who is not writing the documents for consumption by AI, you are primarily writing documents for someone who is not you; you do not know what this person will need or if they will ever even look at them.
They may, of course, help you, but you may not understand that, have the time, or discipline.
If you are writing them because the AI using them will help you, you have a very strong and immediate incentive to document the necessary information. You also have the benefit of a short feedback loop.
Side note, thanks to the LLMs penchant of wiping out comments, I have a lot more docs these days and far fewer comments.
I think it's not at all a marshmellow test; quite the opposite - docs used to be written way, way in advance of their consumption. The problem that implies is twofold. Firstly, and less significantly, it's just not a great return on investment to spend tons of effort now to maybe help slightly in the far future.
But the real problem with docs is that for MOST usecases, the audience and context of the readers matter HUGELY. Most docs are bad because we can't predict those. People waste ridiculous amounts of time writing docs that nobody reads or nobody needs based on hypotheses about the future that turn out to be false.
And _that_ is completely different when you're writing context-window documents. These aren't really documents describing any codebase or context within which the codebase exists in some timeless fashion, they're better understood as part of a _current_ plan for action on a acute, real concern. They're battle-tested the way docs only rarely are. And as a bonus, sure, they're retainable and might help for the next problem too, but that's not why they work; they work because they're useful in an almost testable way right away.
The exceptions to this pattern kind of prove the rule - people for years have done better at documenting isolatable dependencies, i.e. libraries - precisely because those happen to sit at boundaries where it's both easier to make decent predictions about future usage, and often also because those docs might have far larger readership, so it's more worth it to take the risk of having an incorrect hypothesis about the future wasting effort - the cost/benefit is skewed towards the benefit by sheer numbers and the kind of code it is.
Having said that, the dust hasn't settled on the best way to distill context like this. It's be a mistake to overanalyze the current situation and conclude that documentation is certain to be the long-term answer - it's definitely helpful now, but it's certainly conceivable that more automated and structured representations might emerge, or in forms better suited for machine consumption that look a little more alien to us than conventional docs.
I know this is highly controversial, but I now leave the comments in. My theory is that the “probability space” the LLM is writing code in can’t help but write them, so if i leave them next LLM that reads the code will start in the same space. Maybe it’s too much, but currently I just want the code to be right and I’ve let go of the exact wording of comments/variables/types to move faster.
I think the code comments straight up just help understanding, whether human or AI.
There's a piece of common knowledge that NBA basketball players can all hit over 90% on free throws, if they shot underhand (granny style). But for pride reasons, they don't throw underhand. Shaq just shot 52%, even though it'd be free points if he could easily shoot better.
I suspect there's similar things in software engineering. I've seen plenty of comments on HN about "adding code comments like a junior software engineer" or similar sentiment. Sure, there's legitimate gripes about comments (like how they can be misleading if you update the code without changing the comment, etc), but I strongly suspect they increase comprehension of code overall.
I can't speak to LLMs, but one of my first tasks was debugging a race condition in a piece of software. (I had no idea that it was a race condition, or even what part of the code it was in.) I spent months babysitting the service and reading the codebase. When I finally fixed the issue the source turned out to be a comment that said the opposite of what the code actually did. The code was a very convoluted one-line guard involving a ternary if / several || && statements. If the comment hadn't been there I think I would've read the code sooner and realized the issue.
Personally, I remove redundant comments AI adds specifically to demonstrate that I have reviewed the code and believe that the AI's description is accurate. In many cases AIs will even add comments that only make sense as a response to my prompt and don't make any sense in-context.
That's the kind of thing LLMs would HELP with, though.
Comments may go out of date, but LLMs can quickly generate comments that are up to date. LLMs are more likely to prevent the situation that you described.
> In many cases AIs will even add comments that only make sense as a response to my prompt and don't make any sense in-context.
I hate this analogy. NBA players can all hit 90% of their free throws shooting overhand too. Just some of them are much worse at handling the pressure and pace change of the situation in a game context.
The underhanded throw is mechanically just better for free throws. Much easier to put backspin, for example. It's just a shot that doesn't help anywhere else.
What? Shaq shot free throws at 85% in practice. Players with good mechanics like Curry wouldn't be caught dead shooting 85% during the season, he's always 90%+ at free throws. Curry would probably shoot free throws at 99.9% in practice; there's plenty of stories of him swishing 100 3-pointers in a row in practice.
I'm not saying Shaq would shoot free throws as good as "90% in game and 99.9% in practice" if he threw free throws underhanded, but clearly Shaq had mechanics issues.
I like comments about intent, about the why. The generators are really bad at intent, they just write dross comments about the how. The worst part is how they're not accustomed to comments about intent and tend to drop those!
Say I wrote a specific comment why this fencepost here needs special consideration. The agent will come through and replace that reasoned comment with "Add one to index".
I think it's very valuable. If there are a bunch of redundant comments I know I haven't actually validated this code does anything useful. I go through line by line and delete all the comments while validating that they do in fact reflect what the code does.
If you are a developer who is not writing the documents for consumption by AI, you are primarily writing documents for someone who is not you; you do not know what this person will need or if they will ever even look at them.
They may, of course, help you, but you may not understand that, have the time, or discipline.
If you are writing them because the AI using them will help you, you have a very strong and immediate incentive to document the necessary information. You also have the benefit of a short feedback loop.
Side note, thanks to the LLMs penchant of wiping out comments, I have a lot more docs these days and far fewer comments.