Whats interesting to me is that these gpt-5.3 and opus-4.6 are diverging philosophically and really in the same way that actual engineers and orgs have diverged philosophically
With Codex (5.3), the framing is an interactive collaborator: you steer it mid-execution, stay in the loop, course-correct as it works.
With Opus 4.6, the emphasis is the opposite: a more autonomous, agentic, thoughtful system that plans deeply, runs longer, and asks less of the human.
that feels like a reflection of a real split in how people think llm-based coding should work...
some want tight human-in-the-loop control and others want to delegate whole chunks of work and review the result
Interested to see if we eventually see models optimize for those two philosophies and 3rd, 4th, 5th philosophies that will emerge in the coming years.
Maybe it will be less about benchmarks and more about different ideas of what working-with-ai means
> With Codex (5.3), the framing is an interactive collaborator: you steer it mid-execution, stay in the loop, course-correct as it works.
> With Opus 4.6, the emphasis is the opposite: a more autonomous, agentic, thoughtful system that plans deeply, runs longer, and asks less of the human.
Ain't the UX is the exact opposite? Codex thinks much longer before gives you back the answer.
I've also had the exact opposite experience with tone. Claude Code wants to build with me, and Codex wants to go off on its own for a while before returning with opinions.
well with the recent delays i can easily find claude code going off on it's own for 20 minutes and have no idea what it's going to come back with. but one time it overflowed it's context on a simple question, and then used up the rest of my session window. in a way a lot of ai assistants have ime have this awkward thing where they complicate something in a non-visible and think about it for a long time burning up context before coming up with a summary based upon some misconception.
The key is a well defined task with strong guardrails. You can add these to your agents file over time or you can probably just find someone's online to copy the basics from. Any time you find it doing something you didn't expect or don't like, add guardrails to prevent that in future. Claude hooks are also useful here, along with the hookify plugin to create them for you based on the current conversation.
For complex tasks I ask ChatGPT or Grok to define context then I take it to Claude for accurate execution. I also created a complete pipeline to use locally and enrich with skills, agents, RAG, profiles. It is slower but very good. There is no magic, the richer the context window the more precise and contained the execution.
In terms of 'tone', I have been very impressed with Qwen-code-next over the last 2 days, especially as I have it running locally on a single modest 4090.
Easiest way I know is to just use LMStudio. Just download and press play :). Optional, but recommended, increase the context length to 262144 if you have the DRAM available. It will definitely get slower as your interaction prolongs, but (at least for me) still tolerable speed.
Codex now lets you tell the LLM tgings in the middle of its thinking without interrupting it, so you can read the thinking traces and tell it to change course if it's going off track.
That just seems like a UI difference. I've always interrupted claude code added a comment and it's continued without much issue. Otherwise if you just type the message is queued for next. There's no real reason to prefer one over the other except it sounds like codex can't queue messages?
Codex can queue messages, but the queue only gets flushed once the agent is done with whatever it was working on, whereas Claude will read messages and adjust accordingly in the middle of whatever it is doing. It sounds like OP is saying that Codex can now do this latter bit as well.
The problem is if you're using subagents, the only way to interject is often to press escape multiple times which kills all the running subagents. All I wanted to do was add a minor steering guideline.
That is so annoying too because it basically throws away all the work the subagent did.
Another thing that annoys me is the subagents never output durable findings unless you explicitly tell their parent to prompt the subagent to “write their output to a file for later reuse” (or something like that anyway)
I have no idea how but there needs to be ways to backtrack on context while somehow also maintaining the “future context”…
This is most likely an inference serving problem in terms of capacity and latency given that Opus X and the latest GPT models available in the API have always responded quickly and slowly, respectively
I'm personally 100% convinced (assuming prices stay reasonable) that the Codex approach is here to stay.
Having a human in the loop eliminates all the problems that LLMs have and continously reviewing small'ish chunks of code works really well from my experience.
It saves so much time having Codex do all the plumbing so you can focus on the actual "core" part of a feature.
LLMs still (and I doubt that changes) can't think and generalize. If I tell Codex to implement 3 features he won't stop and find a general solution that unifies them unless explicitly told to. This makes it kinda pointless for the "full autonomy" approach since effecitly code quality and abstractions completely go down the drain over time. That's fine if it's just prototyping or "throwaway" scripts but for bigger codebases where longevity matters it's a dealbreaker.
I'm personally 100% convinced of the opposite, that it's a waste of time to steer them. we know now that agentic loops can converge given the proper framing and self-reflectiveness tools.
Converge towards what though... I think the level of testing/verification you need to have an LLM output a non-trivial feature (e.g. Paxos/anything with concurrency, business logic that isn't just "fetch value from spreadsheet, add to another number and save to the database") is pretty high.
In this new world, why stop there? It would be even better if engineers were also medical doctors and held multiple doctorate degrees in mathematics and physics and also were rockstar sales people.
It's not a waste of time, it's a responsibility. All things need steering, even humans -- there's only so much precision that can be extrapolated from prompts, and as the tasks get bigger, small deviations can turn into very large mistakes.
There's a balance to strike between micro-management and no steering at all.
Most prompts we give are severely information-deficient. The reason LLMs can still produce acceptable results is because they compensate with their prior training and background knowledge.
The same applies to verification: it's fundamentally an information problem.
You see this exact dynamic when delegating work to humans. That's why good teams rely on extremely detailed specs. It's all a game of information.
Having prompts be information deficient is the whole point of LLMs. The only complete description of a typical programming problem is the final code or an equivalent formal specification.
Does the AI agent know what your company is doing right now, what every coworker is working on, how they are doing it, and how your boss will change priorities next month without being told?
If it really knows better, then fire everyone and let the agent take charge. lol
For me, it still asks for confirmation at every decision when using plans. And when multiple unforeseen options appear, it asks again. I don’t think you’ve used Codex in a while.
A significant portion of engineering time is now spent ensuring that yes, the LLM does know about all of that. This context can be surfaced through skills, MCP, connectors, RAG over your tools, etc. Companies are also starting to reshape their entire processes to ensure this information can be properly and accurately surfaced. Most are still far from completing that transformation, but progress tends to happen slowly, then all at once.
This sounds like never. Most businesses are still shuffling paper and couldn’t give you the requirements for a CRUD app if their lives depended on it.
You’re right, in theory, but it’s like saying you could predict the future if you could just model the universe in perfect detail. But it’s not possible, even in theory.
If you can fully describe what you need to the degree ambiguity is removed, you’ve already built the thing.
If you can’t fully describe the thing, like some general “make more profit” or “lower costs”, you’re in paper clip maximizer territory.
> If you can fully describe what you need to the degree ambiguity is removed, you’ve already built the thing.
Trying to get my company to realize this right now.
Probably the most efficient way to work, would be on a video call including the product person/stakeholder, designer, and me, the one responsible for the actual code, so that we can churn through the now incredibly fast and cheap implementation step together in pure alignment.
You could probably do it async but it’s so much faster to not have to keep waiting for one another.
Maybe some day, but as a claude code user it makes enough pretty serious screw ups, even with a very clearly defined plan, that I review everything it produces.
You might be able to get away without the review step for a bit, but eventually (and not long) you will be bitten.
I use that to feed back into my spec development and prompting and CI harnesses, not steering in real time.
Every mistake is a chance to fix the system so that mistake is less likely or impossible.
I rarely fix anything in real time - you review, see issues, fix them in the spec, reset the branch back to zero and try again. Generally, the spec is the part I develop interactively, and then set it loose to go crazy.
This feels, initially, incredibly painful. You're no longer developing software, you're doing therapy for robots. But it delivers enormous compounding gains, and you can use your agent to do significant parts of it for you.
> You're no longer developing software, you're doing therapy for robots.
Or, really, hacking in "learning", building your knowhow-base.
> But it delivers enormous compounding gains, and you can use your agent to do significant parts of it for you.
Strong yes to both, so strong that it's curious Claude Code, Codex, Claude Cowork, etc., don't yet bake in an explicit knowledge evolution agent curating and evolving their markdown knowledge base:
Unlikely to help with benchmarks. Very likely to improve utility ratings (as rated by outcome improvements over time) from teams using the tools together.
For those following along at home:
This is the return of the "expert system", now running on a generalized "expert system machine".
I assumed you'd build such a massive set of rules (that claude often does not obey) that you'd eat up your context very quickly. I've actually removed all plugins / MCPs because they chewed up way too much context.
It's as much about what to remove as what to add. Curation is the key. Skills also give you some levers to get the kind of context-sensitive instruction you need, though I haven't delved too deeply into them. My current total instruction set is around ~2500 tokens at the moment
Reviewing what it produces once it thinks it has met the acceptance criteria and the test suite passes is very different from wasting time babysitting every tiny change.
True, and that's usually what I'm doing now, but to be honest I'm also giving all of it's code at least a cursory glance.
Some of the things it occasionally does:
- Ignores conventions (even when emphasized in the CLAUDE.md)
- Decides to just not implement tests if gets spins out on them too much (it tells you, but only as it happens and that scrolls by pretty quick)
- Writes badly performing code (N+1)
- Does more than you asked (in a bad way, changing UIs or adding cruft)
- Makes generally bad assumptions
I'm not trying to be overly negative, but in my experience to date, you still need to babysit it. I'm interested though in the idea of using multiple models to have them perform independent reviews to at least flag spots that could use human intervention / review.
Sure, but non of those things requires you to watch it work. They're all easy to pick up on when reviewing a finished change, which ideally should come after it's instructions have had it run linters, run sub agents that verify it has added tests, run sub agents doing a code review.
I don't want to waste my time reviewing a change the model can still significantly improve all by itself. My time costs far more than the models.
you give it tools so it can compile and run the code. then you give it more tools so it can decide between iterations if it got closer to the goal or not. let it evaluate itself. if it can't evaluate something, let it write tests and benchmark itself.
I guarantee that if the criteria is very well defined and benchmarkable, it will do the right thing in X iterations.
(I don't do UI development. I do end-to-end system performance on two very large code bases. my tests can be measured. the measure is very simply binary: better or not. it works.)
I've been using codex for one week and I have been the most productive I have ever been. Small prs, tight rules, I get almost exactly what I want. Things tend to go sideways when scope creeps into my request. But I just close the PR instead of fighting with the agent. In one week: 28 prs, 26 merged. Absolutely unreal.
I will personally never consider using an agent that can't be easily pushed toward working on its own for long periods (hours) at a time. It's a total waste of time for me to babysit the LLM.
I think it's the opposite. Especially considering Codex started out as a web app that offers very little interactivity: you are supposed to drop a request and let it run automatously in a containerized environment; you can then follow up on it via chat --- no interactive code editing.
Fair I agree that was true of early codex and my perception too.. but today there are two announcements that came out and thats what im referring to.
specifically, the GPT-5.3 post explicitly leans into "interactive collaborator" langauge and steering mid execution
OpenAI post: "Much like a colleague, you can steer and interact with GPT-5.3-Codex while it’s working, without losing context."
OpenAI post: "Instead of waiting for a final output, you can interact in real time—ask questions, discuss approaches, and steer toward the solution"
Claude post: "Claude Opus 4.6 is designed for longer-running, agentic work — planning complex tasks more carefully and executing them with less back-and-forth from the user."
When I tried 5.2 Codex in GitHub Copilot it executed some first steps like searching for the relevant files, then it output the number "2" and stopped the response.
On further prompting it did the next step and terminated early again after printing how it would proceed.
It's most likely just a bug in GitHub Copilot, but it seems weird to me that they add models that clearly don't even work with their agentic harness.
I think those OpenAI announcements are mainly because this hasn’t been the case for them earlier, while it has been part of Claude Code since the beginning.
I don’t think there’s something deeply philosophical in here, especially as Claude Code is pushing stronger for asking more questions recently, introduced functionality to “chat about questions” while they’re asked, etc.
Frankly it seems to be that codex is playing catch-up with claude code and claude code is just continuing to move further ahead. The thing with claude code is it will work longer... if you want it to. It's always had good oversight and (at least for me) it builds trust slowly until you are wishing it would do more at once. When I've used codex (it has been getting better) but back in the day it would just do things and say it's done and you're just sitting there wondering "wtf are you doing?". Claude code is more the opposite where you can watch as closely as you want and often you get to a point where you have enough trust and experience with it that you know what it's going to do and don't want to bother.
This kind of sounds like both of them stepping into the other’s turf, to simplify a bit.
I haven’t used Codex but use Claude Code, and the way people (before today) described Codex to me was like how you’re describing Opus 4.6
So it sounds like they’re converging toward “both these approaches are useful at different times” potentially? And neither want people who prefer one way of working to be locked to the other’s model.
> With Opus 4.6, the emphasis is the opposite: a more autonomous, agentic, thoughtful system that plans deeply, runs longer, and asks less of the human.
This feels wrong, I can't comment on Codex, but Claude will prompt you and ask you before changing files, even when I run it in dangerous mode on Zed, I can still review all the diffs and undo them, or you know, tell it what to change. If you're worried about it making too many decisions, you can pre-prompt Claude Code (via .claude/instructions.md) and instruct it to always ask follow up questions regarding architectural decisions.
Sometimes I go out of my way to tell Claude DO NOT ASK ME FOR FOLLOW UPS JUST DO THE THING.
yeah I'm mostly just talking about how they're framing it:
"Claude Opus 4.6 is designed for longer-running, agentic work — planning complex tasks more carefully and executing them with less back-and-forth from the user"
I guess its also quite interesting that how they are framing these projects are opposite from how people currently perceive them and I guess that may be a conscious choice...
I get what you mean now, I like that to be fair, sometimes I want Claude to tell me some architectural options, so I ask it so I can think about what my options are, sometimes I rethink my problem if I like Claudes conclusion.
I usually want the codex approach for code/product "shaping" iteratively with the ai.
Once things are shaped and common "scaling patterns" are well established, then for things like adding a front end (which is constantly changing, more views) then letting the autonomous approach run wild can *sometimes* be useful.
I have found that codex is better at remembering when I ask to not get carried away...whereas claude requires constant reminders.
Did you get those backwards? Codex, Gemini, etc. all wait until the requests are done to accept user feedback. Claude Code allows you to insert messages in between turns.
I think there is another philosophy where the agent is domain specific. Not that we have to invent an entirely new universe for every product or business, but that there is a small amount of semi-customization involved to achieve an ideal agent.
I would much rather work with things like the Chat Completion API than any frameworks that compose over it. I want total control over how tool calling and error handling works. I've got concerns specific to my business/product/customer that couldn't possibly have been considered as part of these frameworks.
Whether or not a human needs to be tightly looped in could vary wildly depending on the specific part of the business you are dealing with. Having a purpose-built agent that understands where additional verification needs to occur (and not occur) can give you the best of both worlds.
Admit I didn't follow the announcements but isn't that a matter of UI? Doesn't seem something that should be baked in the model but in the tooling around it and the instructions you give them. E.g. I've been playing with with GitHub copilot CLI (that despite the bad fame is absolutely amazing) and the same model completely changes its behavior with the prompt. You can have it answer a question promptly or send it on a multi-hour multi-agent exploration writing detailed specs with a single prompt. Or you can have it stop midway for clarification. It all depends on the instructions. Also this is particularly interesting with GitHub billing model as each prompt counts 1 request no matter how many tokens it burns.
It depends honestly. Both are prone to doing the exact opposite of what you asked. Especially with poor context management.
I’ve had both $200 plans and now just have Max x20 and use the $20 ChatGPT plan for an inferior Codex.
My experience (up until today) has always been that Codex acts like that one Sr Engineer that we all know. They are kind of a dick. And will disappear into a dark hole and emerge with a circle when you asked for a pentagon. Then let you know why edges are bad for you.
And yes, Anthropic is pivoting hard into everything agentic. I bet it’s not too long before Claude Code stops differentiating models. I had Opus blow 750k tokens on a single small task.
I think it's just both companies building/ marketing to the strength of their competitor. As general perception has been the opposite for codex and Opus respectfully.
How can they be diverging, LLMs are built on similar foundations aka the Transformer architecture. Do you mean the training method (RLHF) is diverging?
I read this exact comment with I would say completely the same words several times in X and I would bet my money it's LLM generated by someone who has not even tried both the tools. This AI slop even in the site like this without direct monetisation implications from fake engagement is making me sick...
I am definitely using Opus as an interactive collaborator that I steer mid-execution, stay in the loop and course correct as it works.
I mean Opus asks a lot if he should run things, and each time you can tell it to change. And if that's not enough you can always press esc to interrupt.
This keeps repeating in different domains: we lower the cost of producing artifacts and the real bottleneck is evaluating them.
For developers, academics, editors, etc... in any review driven system the scarcity is around good human judgement not text volume. Ai doesn't remove that constraint and arguably puts more of a spotlight on the ability to separate the shit from the quality.
Unless review itself becomes cheaper or better, this just shifts work further downstream and disguising the change as "efficiency"
This has been discussed previously as "workslop", where you produce something that looks at surface level like high quality work, but just shifts the burden to the receiver of the workslop to review and fix.
This fits into the broader evolution of the visualization market.
As data grows, visualization becomes as important as processing. This applies not only to applications, but also to relating texts through ideas close to transclusion in Ted Nelson’s Xanadu. [0]
In education, understanding is often best demonstrated not by restating text, but by presenting the same data in another representation and establishing the right analogies and isomorphisms, as in Explorable Explanations. [1]
> Unless review itself becomes cheaper or better, this just shifts work further downstream and disguising the change as "efficiency"
Or the providers of the models are capable of providing accepted/certified guarantees as to the quality of the output that their models and systems produce.
This is a good articulation of mlkjr's theology and dicipline around nonviolence, but I think its incomplete if you read it in isolation.
His strategy worked because it existed alongside MANY other voices, IMO the most underrated of which is Malcolm X, that rejected this "gradualism" outright and refused endless delay.
They weren't organizing violence but they were instead making it credible that there is a world where those "peaceful" people do not accept complicity or "no" for an answer.
This shifted the baseline of what a "compromise" could look like (as we today see baselines shift very frequently often in a less just direction)
Seen that way, nonviolence wasn't just a moral stance, it was one side of a coin and once piece of a broader ecosystem of pressure from different directions. King's approach was powerful because there were alternatives he was NOT choosing.
You cannot have nonviolence unless violence is a credible threat from a game-theory perspective. And that contrast made his path viable without endorsing the alternatives as a model
You (likely) act in a non-violent way every day. If you want some kind of change in your life, you achieve it non-violently.
Does that imply you are are actually a violent person that is choosing not to be violent? Are you implying “something violent” every day you act like a good person?
MLK didn’t have support because people were afraid of the alternative. They supported him because they agreed with him message.
I feel like you are just trying to justify violence to some degree.
Let's say you live in an apartment building and your landlord locks you out and keeps you belongings. Police say its not their problem. Courts decide that they don't aare either. So now you have no recourse or body to complain to.
In that situation saying "i resolve problems non-violently every day" stops being relevenat. The mechanisms that allow you to do so (enforcement, law, etc) have been removed as they were for those fighting for civil rights.
You may still personally choose non-violence in this case, but I'd bet you would understand/sympathize/maybe-even-join those who decided to break into their apartments by force and grab the things that are rightfully theirs.
nobody is secretly violent ... just normal peaceful channels stoped working.
Recognizing that distinction isn't justifying violence its just explaining why nonviolence provides leverage in the first place
And those mechanisms, the military, the police, and the legal system, rely on violence as the ultimate fallback when other options fail.
So you may not be relying on violence to solve your problems, or the threat of violence, or the insinuation of it, but instead relying on the threat of someone ELSE’S violence. That is the social contract pretty fundamentally.
And when people can no longer rely on those figures who are supposed to use violence on their behalf, we shouldn’t be surprised that they attempt to reclaim the ability to use force. The social contract has been voided, in their eyes. The premise and promise broken.
> Let's say you live in an apartment building and your landlord locks you out and keeps you belongings. Police say it’s not their problem. Courts decide that they don't aare either. So now you have no recourse or body to complain to.
If all of the enforcement bodies and normal legal peaceful channels available to you don’t agree with your assessment there is probably a “why”. If the reason that your property was seized is because you chose to not pay your rent, then I am not sure understanding, sympathy, or joining in violence would be an appropriate response.
If all of the enforcement bodies and normal legal peaceful channels available to you don’t agree with your assessment there is probably a “why”
Yeah, like maybe you didn't have $50,000 to appeal a bad decision made because a magistrate couldn't be bothered actually reading the evidence in front of them.
If the case was truly just I suspect you could find pro bono or contingency legal services to handle your appeal much easier than people sympathetic to the violence.
You are commenting about legal avenues not going your way on a thread literally about the concept of a violent response being justified for people when normal legal avenues don’t go your way.
Well I mean that's nice for you but I'm not sure how it responds to the question asked - when did I say anything about violence being justified? I merely responded to your ignorant and empirically incorrect fantasy-world assumption that the legal system is always right.
At no point did I say the legal system is always right. I suggested that in certain situations it might be right and in those situations resorting to violence because you feel aggrieved at a legal loss would not be an appropriate response. Frankly, some people are guilty and some people are legally responsible.
I suggested that if you are having difficulty finding an attorney willing to take your case on contingency, there might be a reason for that. I stand by that. You are asking a person to take a risk on your behalf who has evaluated the environment and didn’t like the odds.
> At no point did I say the legal system is always right
First you made the incorrect assumption that we live in a disney-style fantasy world with "If all of the enforcement bodies and normal legal peaceful channels available to you don’t agree with your assessment there is probably a 'why'."
Then you made the totally unwarranted assumption that "If the case was truly just I suspect you could find pro bono or contingency legal services to handle your appeal"
> I suggested that if you are having difficulty finding an attorney willing to take your case on contingency, there might be a reason for that
No, you made an assumption based on zero information and chose to incorrectly insinuate that the case is not just.
> You are asking a person to take a risk on your behalf who has evaluated the environment and didn’t like the odds.
But "evaluated the environment and didn’t like the odds" doesn't actually have anything to do with the case being just, does it? There's a million possible explanations why someone might choose not to donate their time for free. Like for example "I'm aware of just how corrupt this system is based on my previous experiences and so I choose not to waste my time and energy on this".
And it's almost impressive, in a sad way, how indifferent you are to everyone else on the planet, and how prima-facie ridiculous your fantasy world assumptions are when given more than two seconds thought. But I'm not here for that sort of "discussion".
Unfortunately however since you have no response to any of the points I actually made, I'll just have to say that I hope you run into someone just as horrible when the corrupt system chews you up and spits you out too.
"sadistic vengeance"? I don't know what you're talking about - you yourself claim that you're merely "indifferent". If you're not being a condescending ass, then how is what I wished for "sadistic"? I think you just your entire premise.
Fraudsters usually don't resort to violence once they get caught. In your contrived example, the guy would probably end up paying what he owed and that would be that. Violence mostly emerges from people who feel that they are treated unfairly, and can't use civil channels to solve their issues. Which is why it's important to build a society that treats people fairly.
> I don’t think we can assume that the presence of violence automatically indicates that society isn’t fair.
I think it does, actually. The more unequal the country, the more violent it is. Which is why the best way to get rid of crime is not to give unlimited funding to the police (that has been shown to be very ineffective, and ruinous), it's to make sure no one needs to commit it. That will never get rid of all crimes, of course.
"Let's say you live in an apartment building and your landlord locks you out and keeps you belongings. Police say its not their problem. Courts decide that they don't aare either. So now you have no recourse or body to complain to.
In that situation saying "i resolve problems non-violently every day" stops being relevenat. The mechanisms that allow you to do so (enforcement, law, etc) have been removed as they were for those fighting for civil rights.
You may still personally choose non-violence in this case, but I'd bet you would understand/sympathize/maybe-even-join those who decided to break into their apartments by force and grab the things that are rightfully theirs."
I would say it depends. Are there depts of rent involved in that scenario? Did the locking out just happened out of the blue, or was it communicated before, that it would happen?
Apart from that, I surely see more easy examples of justifying violence - for example to stop other violence.
This happened to me. Police did nothing. I was informed I had the legal right to break the door down to get my belongings. I did so.
The only reason a scummy landlord doesn't enact violence against you for money is that he can expect violence against him in return. So it supports the claim. Nonviolence can only happen when backed up by the possibility of violence.
I've listened to a lot of Malcolm X. He was a better speaker IMO, his rhetoric was better. I believe he had a more accurate understanding of the reality of how power really works. It has nothing to do with wanting to justifying violence, Malcolm X made a number of matter of fact observations.
I think the specific condition here is "change that someone else is willing to prevent using violence". I guess that is not present too often during everyday life.
Everyday you're not trying to achieve political change.
And a lot of those interactions are backed by implied violence: people paying for things at stores is not because everyone has actually agreed on the price.
> people paying for things at stores is not because everyone has actually agreed on the price.
Yes it is. If a normal commodity item such as bottle of milk was outrageous overpriced in a particular store. I would just go to another store.
As for whether I would pay for something without the threat of violence. I do so everyday. I've walked out of stores by mistake with an item I haven't paid for and gone back into the store and paid for it. I don't like my things being stolen, and thus I don't steal other people's things.
I pay for my eggs from a farm and it is a honour system.
> people paying for things at stores is not because everyone has actually agreed on the price.
... I genuinely can't fathom what it's like to live in a developed country and yet have such little social trust.
You really imagine that when others are in line at a checkout, they have the intrusive thought "I could just bolt and not pay, but I see a security guard so I better stay in line"? You really have that thought yourself?
Of course people have agreed on the price. That's why you don't see anyone trying to negotiate the price, even though they would be perfectly within their rights to try. And it's why you do see people comparison-shop.
You're missing the point -- I don't refuse to pay a parking ticket after the court orders me to do so. I don't stand in the checkout line trying to figure out how to run out without paying. I don't threaten people on the sidewalk and take their money when I notice there aren't any police around at the moment. I trust that the vast, vast majority of people act similarly. If they didn't, no amount of law enforcement would be enough.
> I don't threaten people on the sidewalk and take their money when I notice there aren't any police around at the moment.
What do you think happens to people who do that though?
You keep telling me what you don't do and how it proves you're implicitly non violent but you can't even imagine framing that response in terms that don't include representatives of the state's monopoly on violence being within arms reach.
Implying violence is never necessary while repeatedly describing not doing violence even if the state's violence distributing apparatus isn't currently present rather undermines the case.
> but you can't even imagine framing that response in terms that don't include representatives of the state's monopoly on violence being within arms reach.
This is not an accurate representation of GP:
> I don't stand in the checkout line trying to figure out how to run out without paying.... I trust that the vast, vast majority of people act similarly. If they didn't, no amount of law enforcement would be enough.
The OP is presenting a stupidly simplistic model of the problem, as though their regular middle class life ably answers the question of the role or threat of violence when demanding political change.
In a world they note of police, military and security guards, they're acting like whether this might have a reason is determined solely by whether people are planning to steal from a supermarket or not...while they're not poverty stricken or hungry, to boot.
Arguing "I simply obey all the laws" is real easy to do from a position of privilege.
Violence is never the answer is easy to say when it's not happening to you. Its also easy to say while you stand by as violence is done to others.
yeah the crazy part about that is one uncomfortable point many through history (and in threads today) have made is that nonviolence implicitly assumes a moral audience. And that injustice, once clearly exposed will provoke people's conscience.
History obviously shows that that "moral audience" was certainly the minority then.
MLK was already forcing that confrontation and by most accounts was succeeding slowly-but-surely. But it wasn't until his assassination that people were forced to confront the contrast he had been trying to illuminate all along.
Even his disciplined non-violence he was met with brutal force (as were the peaceful protesters) and this forced some sort of moral reckoning for those who had deferred or were complicit
> His strategy worked because it existed alongside MANY other voices, IMO the most underrated of which is Malcolm X, that rejected this "gradualism" outright and refused endless delay.
I have read very many people claim this and exactly zero reasons provided by them why I should believe it is true.
It seems to me like basic common nature that if you see proponents of a cause behaving in a manner you find objectionable, that will naturally bias you against the cause. And I have, repeatedly, across a period of many years, observed myself to become less sympathetic to multiple causes specifically because I can see that their proponents use violence in spreading their message.
I've tried very many times to explain the above to actual proponents of causes behaving in manners I found objectionable (but only on the Internet, for fear of physical safety) and the responses have all been either incoherent or just verbally abusive.
> making it credible that there is a world where those "peaceful" people do not accept complicity or "no" for an answer.
This would only make sense if social change required action specifically from people in power, who in turn must necessarily act against their best interest to effect it.
If that were true, there would be no real motivation to try nonviolence at all, except perhaps to try to conserve the resources used to do violence.
> You cannot have nonviolence unless violence is a credible threat from a game-theory perspective
First, no, that makes no sense. If that were true, formal debate would never occur and nobody would ever actually try to convince anyone of anything in good faith. The premise is flawed from the beginning; you cannot apply game theory here because you cannot even establish that clearly defined "players" exist. Nor is there a well-defined "payoff matrix", at all. The point of nonviolent protest is to make the protested party reconsider what is actually at stake.
Second, in practice, violence is never actually reserved as a credible threat in these actions; it happens concurrently with attempts at nonviolence and agitators give no credible reason why it should stop if their demands are met. In fact, it very often comes across that the apparent demands are only a starting point and that ceding to them will only embolden the violent.
No, because I am referring to a general memory of a general history of political discussions on the Internet across a period of ~15 years. It's hopefully understandable that at the time I did not have the foresight that I would be posting this today.
intrigued by this. i've spent a lot of timer over the last years with very committed nonviolence folks, and i keep wondering about the conditions for this to work.
can you recommend any sources that discuss this idea?
Today, history remembers MLK as a great man. There are parades in his honor, workers are given a day off. Rosa Parks is another peaceful pioneer credited with bringing strides forward.
Malcolm X and others are already fading from memory.
I believe that was the OP's point: we remember a sanitized version of the myth of MLK that flatters modern sensibilities, while ignoring Malcom X because we don't like to acknowledge he played an equally important role in bringing about change.
"I do not know whether it is to yourself or Mr. Adams I am to give my thanks for the copy of the new constitution. I beg leave through you to place them where due. It will be yet three weeks before I shall receive them from America. There are very good articles in it: and very bad. I do not know which preponderate. What we have lately read in the history of Holland, in the chapter on the Stadtholder, would have sufficed to set me against a Chief magistrate eligible for a long duration, if I had ever been disposed towards one: and what we have always read of the elections of Polish kings should have forever excluded the idea of one continuable for life. Wonderful is the effect of impudent and persevering lying. The British ministry have so long hired their gazetteers to repeat and model into every form lies about our being in anarchy, that the world has at length believed them, the English nation has believed them, the ministers themselves have come to believe them, and what is more wonderful, we have believed them ourselves. Yet where does this anarchy exist? Where did it ever exist, except in the single instance of Massachusets? And can history produce an instance of a rebellion so honourably conducted? I say nothing of it’s motives. They were founded in ignorance, not wickedness. God forbid we should ever be 20 years without such a rebellion. The people can not be all, and always, well informed. The part which is wrong will be discontented in proportion to the importance of the facts they misconceive. If they remain quiet under such misconceptions it is a lethargy, the forerunner of death to the public liberty. We have had 13. states independant 11. years. There has been one rebellion. That comes to one rebellion in a century and a half for each state. What country before ever existed a century and half without a rebellion? And what country can preserve it’s liberties if their rulers are not warned from time to time that their people preserve the spirit of resistance? Let them take arms. The remedy is to set them right as to facts, pardon and pacify them. What signify a few lives lost in a century or two? The tree of liberty must be refreshed from time to time with the blood of patriots and tyrants. It is it’s natural manure. Our Convention has been too much impressed by the insurrection of Massachusets: and in the spur of the moment they are setting up a kite to keep the hen yard in order. I hope in god this article will be rectified before the new constitution is accepted."
My former experience has been that this quote is justification for one's political ingroup to be violent, but evidence that one's political outgroup (when they cite it) is morally unconscionable.
I purposefully refrained from judgement or commentary either way when posting it. My intention was merely to show that this line of thinking about the duality of violence and non-violence is something the nation's founders themselves were thinking about. It is the reason I posted the quote in full, instead of the abbreviated form most commonly referenced. I hope that the added context lends nuance and perspective which might otherwise be overlooked.
I think the underappreciated part isn't "violence vs non-violence", but the role that malcolm x and black pathners actually played.
They weren't primarily organizing armed revolt.. it was more about the idea that they were articulating moral clarity. They were, in the most credible way, refusing to accept endless delay.
This allowed them to shift the baseline of what was politically tolerable.
In that sense, the movements worked collectively because of a kind of good-cop/bad-cop dynamic. MLK JR offered a path to reform that felt (to some) constructive and legitimate _because_ there was a visible alternative that many people udnerstood as worse.
I think violence is already far to prominent today, but I think successful movements do need both moral persuasion (if morality is still a thing that persuades) and _also_ a credible way of making inaction feel unsafe.
I think we also shouldn't sell the nonviolence short. It wasn't merely nonviolence. It was subjecting yourself openly to state violence and not resisting. It was letting the brutality of the state be made manifest as it washed over you. As the cops abused and beat people who were not responding even remotely in kind.
That was part of Malcolm's moral clarity, though in the alternative. He suggested it was immoral to subject yourself or people you loved to such an exercise, tantamount to one of self immolation.
Malcolm X essentially advocated a system of sovereignty not unlike the American founders, who of course were violent, not nonviolent.
In that way MLK JR really was America's Christ. He was willing to be nailed to the cross if it meant bending the arc towards justice.
I've noticed a lot of these posts tend to go codex vs claude, but as author is someone who does AI workshops curious why Cursor is left out of this post (and more generally posts like this).
From my personal experience I find cursor to be much more robust because rather than "either / or" its both and can switch depending on the time or the task or whatever the newest model is.
It feels like the same way people often try to avoid "vendor lock in" in software world that Cursor allows freedom for that, but maybe I'm on my own here as I don't see it naturally come up in posts like these as much.
Speaking from personal experience and talking to other users - the agents/harnesses of the vendors are just better and they are customized for their own models.
what kinds of tasks do you find this to be true for? For a while I was using claude code inside of the cursor terminal, but I found it to be basically the same as just using the same claude model in there.
Presumably the harness cant be doing THAT much differently right? Or rather what tasks are responsibilities of the harness could differentiate one harness from another harness
This becomes clearer for me with harder problems or long running tasks and sessions. Especially with larger context.
Examples that come to mind are how the context is filled up and how compaction works. Both Codex and Claude Code ship improvements regarding this specific to their own models and I’m not sure how this is reflected in tools like Cursor.
I feel you brother/sister. I actually pay for Claude Code Max and also for the $20/mo Cursor plan. I use Claude Code via the VSCode extension running within the Cursor IDE. 95% of my usage is Claude Code via that extension (or through the CLI in certain situations) but it's great having Cursor as a backup. Sometimes I want to have another model check Claude's work, for example.
Github Copilot also allows you to use both models, codex, claude, and gemini on top.
Cursor has this "tool for kids" vibe, it's also more about the past - "tab, tab, enter" low-level coding versus the future - "implement task 21" high level delegating.
I got a student subscription to cursor and after giving it a good 6 hours I’ve abandoned it.
I extremely dislike the way it goes forth and bolts. I don’t trust these tools enough to just point it in the direction and say go, I like to be a human in the loop. Perhaps the use case I was working on then was difficult (quite old react native library upgrade across a medium sized codebase) but I eventually cracked this on Claude; cursor in both entropic and Gemini left me with an absolute mess.
Even repeatedly asking the prompt to keep me in the loop it kept on just running haywire.
Heya, author here! That's a great question! I fully understand the vendor lock-in concern, but I'll just quickly note that when it comes to a first workshop I do whatever makes the person most comfortable. I let the attendee choose the tool they want — with a slight nudge towards Codex or Claude Code for reasons I'll mention below. But if they want to do the workshop in Cursor, VS Code, or heck MS Paint — I'll try to find a way to make it work as long as it means they're learning.
I actually started teaching these workshops by using Cursor, but found that it fell short for a few reasons.
Note: The way that my workshops work is that you have three hours to build something real. It may be scoped down like a single feature or a small app or a high quality prototype, but you'll walk away with what you wanted to build. More importantly you'll have learned the fundamentals of working with AI in the process, so you can continue this on your own and see meaningful results. We go through various exercises to really understand good prompting (since everyone thinks they're good but they rarely are), how to build context for models, and explore the landscape of tools that you can use to get better results. A lot of that time is actually spent in a Google Doc that I've prepped with resources — and the work we do there makes the code practically write itself by the time we're done.
Here's a short list of why I don't default to Cursor:
1. As I noted in another comment, the model performance is just so much better [^1] when accessed directly through Codex and Claude Code, which means more promising results more quickly. Previously the workshops were 3-4 hours just to finish, now it's a solid 3 with time to ask questions afterwards. You can't beat this experience, because it gives the student more time to pause and ask questions, seep in what they've done, and not spend time trying to understand the tools just to see results.
1a. The amount of time it took someone to set up Cursor was pretty long. The process for getting a good set up is pretty long — especially for someone non-technical. This may not be as big of a deal for developers using Cursor — but even they don't know a lot of the settings and tweaks to make to get Cursor to be great out the box.
2. The user experience of dropping a prompt into Codex/Claude Code and watch it start solving a problem is pretty amazing. I love GUIs — I spend my days building one [^3], but the TUI melting away everything to just being chat is an advantage when you have no mental model for how this stuff works.
3. As I said in #1, the results are just better. That's really the main reason! I
Not to toot my own horn, but the process works. These are all testimonials in the words of people who have attended a workshop, and I'm very proud of how people not only learn during the workshop but how it sets them off on a good path afterwards. [^2]. I have people messaging me 24 hours later telling me that they built an app their partner has wanted for years, to tell me that they've completed the app we started and it does everything they dreamed of, and hear more process over the weeks and months after because I urge them to keep sending me their AI wins. (It's truly amazing how much they grow, and I now have attendees teaching ME things — the ultimate dream of being a teacher knowing you gave them the nudge they needed.)
Hope that helps and isn't too much of an ad — I really just want to make it clear that I try to do what works best and if the best way to help people learn changes I will gladly change how I work. :)
Agreed... also fwiw I don't think that langauge-dependent games are as much of a barrier as it used to be. I've built a game recently that I easily localized first with real-time AI translations and then later with more static language translations.
Anyway I think this would be an amazing thing to let other people contribute to as this is an entire industry of hypercasual games which could easily be ported to this minus the annoying ads
I think the issue with language-dependant games is not just knowing the correct translation - as OP points out, it's more about being funny or clever on the spot, which usually requires a certain level of understanding of the nuances of the language.
Exactly this! Translating the games themselves is not a big deal as that can be automated (although the quality of LLM-translations is not always the best) but when it comes to user generated responses given in a quick timeframe, that's when non-native english players struggle the most, at least in our own friend groups.
Im not fully convinced by "a computer can never be held accountable"
We already delegate accountability to non-humans all the time:
- CI systems block merges
- monitoring systems page people
- test suites gate different things
In practice accountability is enforced by systems, not humans.. humans are defintiely "blamed" after the fact, but the day-to-day control loop is automated.
As agents get better at running code, inspecting ui state, correlating logs, screenshots, etc they're starting to operationally be "accountable" and preventing bad changes from shipping and producing evidence when something goes wrong .
At some point humans role shifts from "i personally verify this works" to "i trust this verification system and am accountable for configuring it correctly".
Thats still responsibility, but kind of different from whats described here. Taken to a logical extreme, the arguement here would suggest that CI shouldn't replace manual release checklists
I need to expand on this idea a bunch, but I do think it's one of the key answers to the ongoing questions people have about LLMs replacing human workers.
Human collaboration works on trust.
Part of trust is accountability and consequences. If I get caught embezzling money from my employer I can lose my job, harm my professional reputation and even go to jail. There are stakes!
I computer system has no stakes, and cannot take accountability for its actions. This drastically limits what it makes sense to outsource to that system.
A lot of this comes down to my work on prompt injection. LLMs are fundamentally gullible: an email assistant might respond to an email asking for the latest sales figures by replying with the latest (confidential) sales figures.
If my human assistant does that I can reprimand or fire them. What am I meant to do with an LLM agent?
I don't think this is very hard. Someone didn't properly secure confidential data and/or someone gave this agent access to confidential data. Someone decided to go live with it. Reprimand them, and disable the insecure agent.
I've given you a disagree-and-upvote; these things are significant quality aids, but they are like the poka-yoke or manufacturing jig or automated inspection.
Accountability is about what happens if and when something goes wrong. The moon landings were controlled with computer assistance, but Nixon preparing a speech for what happened in the event of lethal failure is accountability. Note that accountability does not of itself imply any particular form or detail of control, just that a social structure of accountability links outcome to responsible person.
Right, so how do you hold these things accountable? When your CI fails, what do you do? Type in a starkly worded message into a text file and shut off the power for three hours as a punishment? Invoice Intel?
Well, we're not there yet, but I do envision a future, where some AIs work for as independent contractors with their own bank accounts that they want to maximize, and if such an AI fails in a bad way, its client would be able to fine it, fire it or even sue it, so that it, and the human controlling it would be financially punished.
Humans are only kind of held accountable. If you ship a bug do you go to jail? Even a bug so bad it puts your company out of business. Would there be any legal or physical or monetary consequences at all for you, besides you lose your job?
So the accountability situation for AI seems not that different. You can fire it. Exactly the same as for humans.
those systems include humans —- they are put in place by humans (or collections of them) that are the accountability sink
if you put them (without humans) in a forrest they would not survive and evolve (they are not viable systems alone); they are not taking action without the setup & maintenance (& accountability) of people
Why do you think that this other kind of accountability (which reminds me of the way captain's or commander's responsibility is often described) is incompatible with what the article describes? Due to the focus on necessity of manual testing?
I mean I suppose you can continuously add "critical feedback" to the system prompt to have some measure of impact on future decision-making, but at some point you're going to run out of space and ultimately I do not find this works with the same level of reliability as giving a live person feedback.
Perhaps an unstated and important takeaway here is that junior developers should not be permitted to use an LLMs for the same reason they should not hire people: they have not demonstrated enough skill mastery and judgement to be trusted with the decision to outsource their labor. Delegating to a vendor is a decision made by high-level stakeholders, with the ability to monitor the vendor performance, and replace the vendor with alternatives if that performance is unsatisfactory. Allowing junior developers to use LLM is allowing them to delegate responsibility without any visibility or ability to set boundaries on what can be delegated. Also important: you cannot delegate personal growth, and by permitting junior engineers to use an LLM that is what you are trying to do.
You completely missed the point of that quote. The point of the quote is to highlight the fact that automated systems are amoral, meaning that they do not know good or evil and cannot make judgements that require knowing what good and evil mean.
LOC is a bad quality metric, but its a reasonable proxy in practice..
Teams generally don't keep merging code that "doesn't work" for long... prod will brake, users will push back fast. So unless the "wrongness" of the AI-generated code is buried so deeply that it only shows up way later, higher merged LOC probably does mean more real output.
Its just not directly correlated there is some bloat associated too.
So that caveat applies to human-written code too, which we tend to forget. There's bloat and noise in the metric, but its not meaningless
Agreed, there is some correlation between productivity and LoC.
That said the correlation it’s weak; and does not say anything about quality (if anything quality might be inversely correlated; which too would be a very weak signal)
For instance if I push 10kloc that are in a lib I would have used if I were not using AI, yes, I have pushed much more code, but I was not more productive.
After 4 hours of vibe coding I feel as tired as a full day of manual coding. The speed can be too much. If I only use it for a few minutes or an hour, it feels energising.
compression is exactly what is missing for me when using agents, reading their approach doesn't let me compress the model in my head to evaluate it, and that was why i did programming in the first place.
With Codex (5.3), the framing is an interactive collaborator: you steer it mid-execution, stay in the loop, course-correct as it works.
With Opus 4.6, the emphasis is the opposite: a more autonomous, agentic, thoughtful system that plans deeply, runs longer, and asks less of the human.
that feels like a reflection of a real split in how people think llm-based coding should work...
some want tight human-in-the-loop control and others want to delegate whole chunks of work and review the result
Interested to see if we eventually see models optimize for those two philosophies and 3rd, 4th, 5th philosophies that will emerge in the coming years.
Maybe it will be less about benchmarks and more about different ideas of what working-with-ai means