Hacker Newsnew | past | comments | ask | show | jobs | submit | arkh's commentslogin

I expected (and still expect) a lot from LLM with cross disciplinary research.

I think they should be the perfect tool to find methods or results in a field which look like it could be used in another field.


This might actually be a limitation of the "predict next word" approach since the network is never trained to predict a result in one field from a result in another. It might still make the connection though, but not as easily.

Every modern (and not so modern) software development method hinge on one thing: requirements are not known and even if known they'll change over time. From this you get the goal of "good" code which is "easy to change code".

Do current LLM based agents generate code which is easy to change? My gut feeling is a no at the moment. Until they do I'd argue code generated from agents is only good for prototypes. Once you can ask your agent to change a feature and be 100% sure they won't break other features then you don't care about how the code looks like.


All the hype is on how fast it is to produce code. But the actual bottleneck has always been the cost of specifying intent clearly enough that the result is changeable, testable, and correct AND that you build something that brings value.

> Once you can ask your agent to change a feature and be 100% sure they won't break other features then you don't care about how the code looks like.

That bar is unreasonably high.

Right now, if I ask a senior engineer to change a feature in a mature codebase, I only have perhaps 70% certainty they won't break other features. Tests help, but only so far.


This bar only seems high because the bar in most companies is already unreasonably low. We had decades of research into functional programming, formal methods and specification languages. However, code monkey culture was cheaper and much more readily available. Enterprise software development has always been a race to the bottom, and the excitement for "vibe coding" is just the latest manifestation of its careless, thoughtless approach to programming.

> functional programming, formal methods and specification languages

Haha. Tell me you've never done professional software development without, etc. None of those things are solutions to the problem, which is: does the code do the business value it's supposed to?


There are limits how badly can such senior screw up, or more likely forget some corner case situation. And he/she is on top of their own code and whole codebase and getting better each time, changing only whats needed, reverting unnecessary changes, seeing bigger picture. That's (also) seniority.

Llm brings an illusion of that, a statistical model that may or may not hit what you need. Repeat the question twice and senior will be better at task the second time. LLM will produce simply different output, maybe.

Do you feel like you have a full control over whats happening here? Business has absolutely insatiable lust for control, and IT systems are an area of each business that C-suite always feel they have least control of.

Reproducibility and general trust is not something marginal but core of good deliveries. Just read this thread - llms have 0 of that.


But if push come to shove any other engineer can come in and debug your senior engineer code. That's why we insist on people creating easy to change code.

With auto generated code which almost no one will check or debug by hand, you want at least compiler level exactitude. Then changing "the code" is as easy as asking your code generator for new things. If people have to debug its output, then it does not help in making maintainable software unless it also generates "good" code.


I am constantly getting LLMs to change features and fix bugs. The key is to micromanage the LLM and its context, and read the changes. It's slower that vibe coding but faster than coding by hand, and it results in working, maintainable software.

A study last year concluded that while AI coding feels faster it actually isn't. At least in mid 2025.

https://news.ycombinator.com/item?id=44522772


The comments explain the nuance there pretty well:

> This study had 16 participants, with a mix of previous exposure to AI tools - 56% of them had never used Cursor before, and the study was mainly about Cursor.

> My intuition here is that this study mainly demonstrated that the learning curve on AI-assisted development is high enough that asking developers to bake it into their existing workflows reduces their performance while they climb that learing curve.

Giving people a tool, that have no experience with it, and expecting them to be productive feels... odd?


That's a good point. Myself is the easiest person to fool.

I knocked together a quick analysis of my commit graphs going back several years, if you're interested: https://mccormick.cx/gh/

My average leading up to 2023 was around 2k commits per year. 2023 I started using ChatGPT and I hit my highest commits so far that year at 2,600. 2024 I moved to a different country, which broke my productivity. I started using aider at the end of 2024 and in 2025 I again hit my highest commits ever at 2,900. This year is looking pretty solid.

From this it looks to me like I'm at least 1.4x more productive than before.

As a freelancer I have to track issues closed and hours pretty closely so I can give estimates and updates to clients. My baseline was always "two issues closed per working day". These are issues I create myself (full stack, self-managed freelancer) so the average granularity has stayed roughly constant.

This morning I closed 8 issues on a client project. I estimate I am averaging around 4 issues per working day these days. I know this because I have to actually close the issues each day. So on that metric my productivity has roughly doubled.

I believe those studies for sure. I think there is nuance to using these tools well, and I think a lot of people are going backwards and introducing more bugs than progress through vibe coding. I do not think I have gone backwards, and the metrics I have available seem to agree with that assessment.


Love your approach and that you actually have "before vs. after" numbers to back it up!

I personally also use AI in a similar way, strongly guiding it instead of vibe-coding. It reduces frustration because it surely "types" faster and better than me, including figuring out some syntax nuances.

But often I jump in and do some parts by myself. Either "starting" something (creating a directory, file, method etc.) to let the LLM fill in the "boring" parts, or "finishing" something by me filling in the "important" parts (like business logic etc.).

I think it's way easier to retain authorship and codebase understanding this way, and it's more fun as well (for me).

But in the industry right now there is a heavy push for "vibe coding".


That makes a lot of sense. Staying hands on is key.

6 months ago in AI development is too old to be relevant.

> Do current LLM based agents generate code which is easy to change?

They do. I am no longer writing code, everything I commit is 100% generated using an agent.

And it produces code depending on the code already in my code-base and based on my instructions, which tell it about clean-code, good-practices.

If you don't get maintainable code from an LLM it's for this reason: Garbage in, garbage out.


Doesn’t this preclude that you already know how to produce good code? How will anyone in the future do this when they haven’t actually programmed?

No. They’re great at slopping out a demo, but god help you if you want minor changes to it. They completely fall apart.

I'd add in "code is easier to write than it is to read" - hence abstraction layers designed to present us with higher level code, hiding the complex implementations.

But LLMs are both really good at writing code _and_ reading code. However, they're not great at knowing when to stop - either finishing early and leaving stuff broken, over-engineering and adding in stuff that's not needed or deciding it's too hard and just removing stuff it deems unimportant.

I've found a TDD approach (with not just unit tests but high-level end-to-end behaviour-driven tests) works really well with them. I give them a high-level feature specification (remember Gherkin specifications?) and tell it to make that pass (with unit tests for any intermediate code it writes), make sure it hasn't broken anything (by running the other high-level tests) then, finally, refactor. I've also just started telling it to generate screenshots for each step in the feature, so I can quickly evaluate the UI flow (inspired by Simon Willison's Rodney tool).

Now I don't actually need to care if the code is easy to read or easy to change - because the LLM handles the details. I just need to make sure that when it says "I have implemented Feature X" that the steps it has written for that feature actually do what is expected and the UI fits the user's needs.


This is the brake on “AI will replace all developers”.

Coding is a correctness-discovery-process. For a real product you need to build to know the right thing. As the product matures those constraints increase in granularity to tighter bits of code (security, performance, etc)

You can have AI write 100% of the code but more mature products might be caring about more and more specific low level requirements.

The time you can let an agent swarm just go are cases very well specificed by years of work (like the Anthropic C compiler)


> Do current LLM based agents generate code which is easy to change?

Yes, if that's your goal and you take steps to achieve that goal while working with agents.

That means figuring out how to prompt them, providing them good examples (they'll work better in a codebase which is already designed to afford future changes since they imitate existing patterns) and keeping an eye on what they're doing so you can tell them "rewrite that like X" when they produce something bad.

> Once you can ask your agent to change a feature and be 100% sure they won't break other features

That's why I tell them to use red/green TDD: https://simonwillison.net/guides/agentic-engineering-pattern...


We won't be able to be sure of 100% with LLMs but maybe proper engineering around evals get us to an acceptable level of quality based on the blast radius/safety profile.

I'd also argue that we should be pushing towards tracer bullets as a development concept and less so prototypes that are nice but meant to be thrown away and people might not do that.

The clean room auto porting, after a messy exploratory prototyping session would be a nice pattern, nonetheless.


Irrelevant because you are not going to make new changes by hand. You will use AI for that.

The thing is, the Toyota methods relies on people on every level to work to improve processes. If you're an employee and know you'll be there 10 years down the line or even until you retire, you have an incentive to improve said processes.

Now check most Western companies: since the 70 / 80, everything is about reducing headcount. Lay-offs, outsourcing, offshoring, now the concept of spending your whole working life at the same company feels like a fever dream. So why would an employee try to improve things for the company when they know there is no future for them there? Better improve their own career and future prospect. So yeah, things like Kaizen are doomed to fail until things change.


> Lay-offs, outsourcing, offshoring, now the concept of spending your whole working life at the same company feels like a fever dream

You are missing something here imo, very few companies actually increase pay (or to be more clear, show a clear way to get there) enough to make it attractive enough to stay there for long periods of time.

From my experience here in Germany the people staying at companies for a long time are those who don't focus on their career.


Moving around distributes knowledge making for a healthier economy overall. The alternative looks like Korean chaebols.

Well, anyone using the product of an open source project is free to fork it and then take on the maintenance. Or organize multiple users to handle the maintenance.

I don't expect free shit forever.


One thing most of those lack is an easy way to share screen.

Now if anyone wants to differentiate their Discord alternative, they want to have most of discord functionalities and add the possibility to be in multiple voice chats (maybe with rights and a channel hierarchy + different push-to-talk binds). It's a missed feature when doing huge operations in games and using the Canary client is not always enough.


Matrix screen sharing is a feature of Element Call / MatrixRTC (in development).

For now, I think they do it through their Jitsi integration. I don't know how easy it is, as I haven't tried it.

https://docs.element.io/latest/element-cloud-documentation/i...


I’ve been self hosting Element Call and use it to call my girlfriend (and also used it with another friend a few nights ago). I’ve had a few problems where when starting the call it seems to not connect but just trying again works, and that’s really the only issue i’ve had that I can think of since setting up a TURN server (before that it would completely fail sometimes, but that’s not Element Call’s fault)


Thanks for sharing. I think the design of MatrixRTC (especially the scaling via hierarchical SFUs) looks promising. It's nice to see someone actually using it at this early stage, even if only for 1:1 calls.


Stoat has screen sharing / video calling in the pipeline at least: https://github.com/stoatchat/stoatchat/issues/313


According to the last comment in the issue it is already available for self hosted clients.


I use MiroTalk for it. Within Element you can set up widgets (basically PWAs) and so you can call via Element’s built in Jitsi widget (or a more reliable dedicated Jitsi link) and then use MiroTalk to share screens. It is a LOT better, especially for streaming video.

In terms of ease of use, it’s like three clicks. Technically more than Discord, but it’s p2p streaming so it’s far nicer quality.


Jitsi does that well


Yup, I was expecting pgtune being mentioned in the article.

And maybe something like HammerDB to check performances.


My question would be: what are the myriad other projects you tasked Opus 4.6 to build and it could not get to a point you could kinda-sorta make a post about?

This kind of headline makes me think of p-hacking.


Isn't identity theft a problem in the US? Especially because something which was not meant to be used as ID is used as one (the SSN)?


> Like blameless postmortems taken to a comical extreme where one person is always doing some careless that causes problems and we all have to brainstorm a way to pretend that the system failed, not the person who continues to cause us problems.

Well, I'd argue the system failed in that the bad person is not removed. The root is then bad hiring decision and bad management of problematic people. You can do a blameless postmortem guiding a change in policy which ends in some people getting fired.


> You can do a blameless postmortem guiding a change in policy which ends in some people getting fired.

In theory maybe, but in my experience the blameless postmortem culture gets taken to such an extreme that even when one person is consistently, undeniably to blame for causing problems we have to spend years pretending it’s a system failure instead. I think engineers like the idea that you can engineer enough rules, policies, and guardrails that it’s impossible to do anything but the right thing.

This can create a feedback loop where the bad players realize they can get away with a lot because if they get caught they just blame the system for letting them do the bad thing. It can also foster an environment where it’s expected that anything that is allowed to happen is implicitly okay to do, because the blameless postmortem culture assigns blame on the faceless system rather than the individuals doing the actions.


agreed, the concept of a 'blameless' post mortem came from airplane crash investigation - but if one pilot crashes 6 commercial jets, we wouldnt say "must be a problem with the design of the controls"


So what do they say actually in aviation? There was a pilot suicide with the whole plane Germanwings Flight 9525, I find it more important the aviation industry did regulatory changes than the fact that (probably) "they blamed the pilot".

I think there are too many people that actually like "blaming someone else" and that causes issues besides software development.


I hope that the pilot responsible was fired and got his license revoked!


Fair point


Blameless postmortems are for processes where everyone is acting in good faith and a mistake was made and everyone wants to fix it.

If one party decides that they don’t want to address a material error, then they’re not acting in good faith. At that point we don’t use blameless procedures anymore, we use accountability procedures, and we usually exclude the recalcitrant people from the remediation process, because they’ve shown bad faith.


> Well, I'd argue the system failed in that the bad person is not removed.

This is just a proxy for "the person is bad" then. There's no need to invoke a system. Who can possibly trace back all the things that could or couldn't have been spotted at interview stage or in probation? Who cares, when the end result is "fire the person" or, probably, "promote the person".


I think as an employer you would prefer not to hire another person that is not productive.

Your customers would prefer to have the enterprise doing stuff rather than hiring and firing.


Of course everyone would prefer that, but hiring is by far the most random thing an org does, even when it spends a huge amount on hiring.


> How do you suggest to deal with Gemini?

Don't. I do not ask my mechanic for medical advice, why would I ask a random output machine?


This "random output machine" is already in large use in medicine so why exactly not? Should I trust the young doctor fresh out of the Uni more by default or should I take advises from both of them with a grain of salt? I had failures and successes with both of them but lately I found Gemini to be extremely good at what it does.


The "well we already have a bunch of people doing this and it would be difficult to introduce guardrails that are consistently effective so fuck it we ball" is one of the most toxic belief systems in the tech industry.


> This "random output machine" is already in large use in medicine

By doctors. It's like handling dangerous chemicals. If you know what you're doing you get some good results, otherwise you just melt your face off.

> Should I trust the young doctor fresh out of the Uni

You trust the process that got the doctor there. The knowledge they absorbed, the checks they passed. The doctor doesn't operate in a vacuum, there's a structure in place to validate critical decisions. Anyway you won't blindly trust one young doctor, if it's important you get a second opinion from another qualified doctor.

In the fields I know a lot about, LLMs fail spectacularly so, so often. Having that experience and knowing how badly they fail, I have no reason to trust them in any critical field where I cannot personally verify the output. A medical AI could enhance a trained doctor, or give false confidence to an inexperienced one, but on its own it's just dangerous.


There's a difference between a doctor (an expert in their field) using AI (specialising in medicine) and you (a lay person) using it to diagnose and treat yourself. In the US, it takes at least 10 years of studying (and interning) to become a doctor.


Even so, it's rather common for doctors to not be albe to diagonise correctly. It's a guessing game for them too. I don't know so much about US but it's a real problem in large parts of the world. As the comment stated, I would take anything a doctor says with a pinch of salt. Particularly so when the problem is not obvious.


These things are not equivalent.

This is really not that far off from the argument that "well, people make mistakes a lot, too, so really, LLMs are just like people, and they're probably conscious too!"

Yes, doctors make mistakes. Yes, some doctors make a lot of mistakes. Yes, some patients get misdiagnosed a bunch (because they have something unusual, or because they are a member of a group—like women, people of color, overweight people, or some combination—that American doctors have a tendency to disbelieve).

None of that means that it's a good idea to replace those human doctors with LLMs that can make up brand-new diseases that don't exist occasionally.


It takes 10 years of hard work to become a profound engineer too yet it doesn't prohibit us missing the things. That argument cannot hold. AI is already wide-spread in medical treatment.


An engineer is not a doctor, nor a doctor an engineer. Yes, AI is being used in medicines - as a tool for the professional - and that's the right use for it. Helping a radiologist read an X-Ray, MRI scan or CT Scan, helping a doctor create an effective treatment plan, warning a pharmacologist about unsafe combinations (dangerous drug interactions) when different medications are prescribed etc are all areas where an AI can make the job of a professional easier and better, and also help create better AI.


And where did I claim otherwise? You're not disagreeing with me but only reinforcing my point


When a doctor gets it wrong they end up in a courtroom, lose their job and the respect of their peers.

Nobody at Google gives a flying fuck.


Not really, these are exceptionaly cases. For most of misdiagnoses or failure to diagnose at all, nothing happens to the doctor.


Why stop at AI? By that same logic, we should ban non-doctors from being allowed to Google anything medical.


Nobody can (and should) stop you from learning and educating yourself. It however doesn't mean just because you can use Google or use AI, you think you can become a doctor:

- Bihar teen dies after ‘fake doctor’ conducts surgery using YouTube tutorial: Report - https://www.hindustantimes.com/india-news/bihar-teen-dies-af...

- Surgery performed while watching YouTube video leaves woman dead - https://www.tribuneindia.com/news/uttar-pradesh/surgery-perf...

- Woman dies after quack delivers her baby while watching YouTube videos - https://www.thehindu.com/news/national/bihar/in-bihar-woman-...

Educating a user about their illness and treatment is a legitimate use case for AI, but acting on its advise to treat yourself or self-medicate would be plain stupidity. (Thankfully, self-medicating isn't as easy because most medication require a prescription. However, so called "alternate" medicines are often a grey area, even with regulations (for example, in India).


> This "random output machine" is already in large use in medicine so why exactly not?

Where does "large use" of LLMs in medicine exist? I'd like to stay far away from those places.

I hope you're not referring to machine learning in general, as there are worlds of differences between LLMs and other "classical" ML use cases.



Instead of asking me to spend $150 and 4 hours, could you maybe just share the insights you gained from this course?


No, I'm not asking you spend $150, I'm providing you the evidence your looking for. Mayo Clinic, probably one of the most prominent private clinics in the US, is using transformers in their workflow, and there's many other similar links you could find online, but you choose to remain ignorant. Congratulations


The existence of a course on this topic is NOT evidence of "large use". The contents of the course might contain such evidence, or they might contain evidence that LLM use is practically non-existent at this point (the flowery language used to describe the course is used for almost any course tangentially related to new technology in the business context, so that's not evidence either).

But your focus on the existence of this course as your only piece of evidence is evidence enough for me.


Focus? You asked me for an evidence. I provided you with the one. And with the one which has a big weight on it. If that's the focus you're looking for then sure. Take it as you will, I am not here to convince anyone in anything. Have a look in the past to see how Transformers have solved the long standing problems nobody believed they are tractable up to that point.


An online course, even if offered by a reputable medical institution, hardly backs your argument.


LLM is just a tool. How the tool is used is also an important question. People vibe code these days, sometimes without proper review, but do you want them to vibe code a nuclear reactor controller without reviewing the code?

In principle we can just let anyone use LLM for medical advice provided that they should know LLMs are not reliable. But LLMs are engineered to sound reliable, and people often just believe its output. And cases showed that this can have severe consequences...


- The AI that are mostly in use in medicine are not LLMs

- Yes. All doctors advice should be taken cautiously, and every doctor recommends you get a second opinion for that exact reason.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: