Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> But that's not how most products work.

That's exactly how most products work :-/

> If you buy a table saw and can't figure out how to cut a straight line in a piece of wood with it - or keep cutting your fingers off - but didn't take any time at all to learn how to use it, that's on you.

Of course - that's deterministic, so if you make a mistake and it comes out wrong, you can fix the mistake you made.

> Why should LLMs be any different?

Because they are not deterministic; you can't use experience with LLMs in any meaningful way. They may give you a different result when you run the same spec through the LLM a second time.



> Because they are not deterministic; you can't use experience with LLMs in any meaningful way. They may give you a different result when you run the same spec through the LLM a second time.

Lots of things, and indeed humans, are also as non-deterministic; I absolutely do use experience working with humans and non-deterministic things to improve my future interactions with them.

Table saws are kinda infamous in this regard: you may say that kick-back is hidden state/incomplete information rather than non-deterministic, but in practice the impact is the same.


> They may give you a different result when you run the same spec through the LLM a second time.

Yes kind of, but only different results (maybe) for the things you didn't specify. If you ask for A, B and C, and the LLM automatically made the choice to implement C in "the wrong way" (according to you), you can retry but specify exactly how you want C to be implemented, and it should follow that.

Once you've nailed your "spec" enough so there isn't any ambiguity, the LLM won't have to make any choices for you, and then you'll get exactly what you expected.

Learning this process, and learning how much and what exactly you have to instruct it to do, is you building up your experience learning how to work with an LLM, and that's meaningful, and something you get better with as you practice it.


> Yes kind of, but only different results (maybe) for the things you didn't specify.

No. They will produce a different result for everything, including the things you specify.

It's so easy to verify that I'm surprised you're even making this claim.

> Once you've nailed your "spec" enough so there isn't any ambiguity, the LLM won't have to make any choices for you, and then you'll get exactly what you expected

1. There's always ambiguity, or else you'll end up an eternity writing specs

2. LLMs will always produce different results even if the spec is 100% unambiguous for a huge variety of reasons, the main one being: their output is non-deterministic. Except in the most trivial of cases. And even then the simple fact of "your context window is 80% full" can lead to things like "I've rewritten half of your code even though the spec only said that the button color should be green"


> It's so easy to verify that I'm surprised you're even making this claim.

Well, to be fair, I'm surprised you're even trying to say this claim isn't true, when it's so easy to test yourself.

If I prompt "Create a function with two arguments, a and b, which returns adding those two together", I'll get exactly what I specify. If I feel like it using u8 instead of u32 was wrong, I add "two arguments which are both u8", then you now get this.

Is this not the experience you get when you use LLMs? How does what you get differ from that?

> 1. There's always ambiguity, or else you'll end up an eternity writing specs

There isn't though, at one point it does end. If it's worth going so deep into specifying the exact implementation is up to you and what you're doing, sometimes it is, sometimes it isn't.

> LLMs will always produce different results even if the spec is 100% unambiguous for a huge variety of reasons, the main one being: their output is non-deterministic.

Again, it's so easy to verify that this isn't true, and also surprising you'd say this, because earlier you say "always ambiguity" yet somehow you seem to also know that you can be 100% unambiguous.

Like with "manual" programming, the answer is almost always "divide and conquer", when you apply that with enough granularity, you can reach "100% umambiguity".

> And even then the simple fact of "your context window is 80% full" can lead to things like "I've rewritten half of your code even though the spec only said that the button color should be green"

Yes, this is a real flaw, once you go beyond two messages, the models absolutely lose track almost immediately. Only workaround for this is constantly restarting the conversation. I never "correct" an agent if they get it wrong with more "No, I meant", I rewrite my first message so there are no corrections needed. If your context goes beyond ~20% of what's possible, you're gonna get shit results basically. Don't trust the "X tokens context length", because "what's possible" is very different from "what's usable".


> If I prompt "Create a function with two arguments, a and b, which returns adding those two together", I'll get exactly what I specify. If I feel like it using u8 instead of u32 was wrong, I add "two arguments which are both u8", then you now get this.

This is actually a good example of how your spec will progress:

First pass: "Create a function [in language $X] with two arguments, a and b, which returns adding those two together"

Second pass: "It must take u8 types, not u32 types"

Third pass: "You are not handling overflows. It must return a u8 type."

Fourth pass: "Don't clamp the output, and you're still not handling overflows"

Fifth pass: "Don't panic if the addition overflows, return an error" (depending on the language, this could be "throw an exception" or return a tuple with an error field, or use an out parameter for the result or error)

For just a simple "add two numbers" function, the specification can easily exceed the actual code. So you can probably understand the skepticism when the task is not trivial, and depends on a lot of existing code.


So you do know how the general "writing specification" part is working, you just have the wrong process. Instead of iterating and adding more context on top, restructure your initial prompt to include the context.

DONT DO:

First pass: "Create a function [in language $X] with two arguments, a and b, which returns adding those two together"

Second pass: "It must take u8 types, not u32 types"

INSTEAD DO:

First pass: "Create a function [in language $X] with two arguments, a and b, which returns adding those two together"

Second pass: "Create a function [in language $X] with two arguments, a and b, both using u8, which returns adding those two together"

----

What you don't want to do, is adding additional messages/context on top of "known bad" context, so instead you should take the clue that the LLM didn't understand correctly as "I need to edit my prompt" not "I need to now after their reply, add more context to correct what was wrong". The goal should be to completely avoid anything bad, not correct it.

Together with this, you build up a system/developer prompt you can reuse across projects/scopes, that follows how you code. In that, you add stuff as you discover what's needed to be added, like "Make sure to always handle Exceptions in X way" or similar.

> > For just a simple "add two numbers" function, the specification can easily exceed the actual code. So you can probably understand the skepticism when the task is not trivial, and depends on a lot of existing code.

Yes, please be skeptical, I am as well, which I guess is why I am seemingly more effective at using LLMs than others who are less skeptical. It's a benefit here to be skeptical, not a drawback.

And yes, it isn't trivial to verify work that others have done for you, when you have a concrete idea of how it should be exactly. But as I managed to work with outsourced/contracting developers before, or even collaborate with developers in the same company as me, I also learned to use LLMs in a similar way where you have to review and ensure code follow the architecture/design you intended.


> INSTEAD DO:

> First pass: "Create a function [in language $X] with two arguments, a and b, which returns adding those two together"

> Second pass: "Create a function [in language $X] with two arguments, a and b, both using u8, which returns adding those two together"

So it will create two different functions (and LLMs do love to ignore anything that came before and create a lot of stuff from scratch again and again). Now what.


What? No, I think you fundamentally misunderstand what workflow I'm suggesting here.

You ask: "Do X". The LLM obliges, gives you something you don't want. At this point, don't accept/approve it, so nothing has changed, you still have an empty directory, or whatever.

Then you start a brand new context, with iteration on the prompt: "Do X with Y", and the LLM again tries to do it. If something is wrong, repeat until you get what you're happy with, extract what you can into reusable system/developer prompts, then accept/approve the change.

Then you end up with one change, and one function, exactly as you specified it. Then if you want, you can re-run the exact same prompt, with the exact same context (nothing!) and you'll get the same results.

"LLMs do love to ignore anything that came before" literally cannot happen in this workflow, because there is nothing that "came before".


> No, I think you fundamentally misunderstand what workflow I'm suggesting here.

Ah. Basically meaningless monkey work of baby sitting an eager junior developer. And this is for a simple thing like adding two numbers. See how it doesn't scale at all with anything remotely complex?

> "LLMs do love to ignore anything that came before" literally cannot happen in this workflow, because there is nothing that "came before".

Of course it can. Because what came before is the project you're working on. Unless of course you end up specifying every single utility function and every single library call in your specs. Which, once again, doesn't scale.


> See how it doesn't scale at all with anything remotely complex?

No, I don't. Does outsourcing not work for you with "anything remotely complex"? Then yeah, LLMs won't help you, because that's a communication issue. Once you figure out how to communicate, using LLMs even for "anything remotely complex" becomes trivial, but requires an open mind.

> Because what came before is the project you're working on.

Right, if that's what you meant, then yeah, of course they don't ignore the existing code, if there is a function that already does what it needs, it'll use that. If the agent/LLM you use doesn't automatically does this, I suggest you try something better, like Codex or Claude Code.

But anyways, you don't really seem like you're looking for improving, but instead try to dismiss better techniques available, so I'm not even sure why I'm trying to help you here. Hopefully at least someone who wants to improve comes across it so this whole conversation wasn't a complete waste of time.


> No, I don't.

Strange. For a simple "add two integers" you now have to do five different updates to specs to make it non-ambiguous, restarting the work from scratch (that is, starting a new context) every time.

What happens when your work isn't to add two integers? How many iterations of the spec you have to do before you arrive at an unambiguous one, and how big will it be?

> Once you figure out how to communicate,

LLMs don't communicate.

> Right, if that's what you meant, then yeah, of course they don't ignore the existing code, if there is a function that already does what it needs, it'll use that.

Of course it won't since LLMs don't learn. When you start a new context, the world doesn't exist. It literally has no idea what does and does not exist in your project.

It may search for some functionality given a spec/definition/question/brainstorming skill/thinking or planning mode. But it may just as likely not. Because there are no actual proper way for anyone to direct it, and the models don't have learning/object permanence.

> If the agent/LLM you use doesn't automatically does this, I suggest you try something better, like Codex or Claude Code.

The most infuriating thing about these conversations is that people hyping AI assume everyone else but them is stupid, or doing something incorrectly.

We are supposed to always believe people who say "LLMs just work", without any doubt, on faith alone.

However, people who do the exact same things, use the exact tools, and see all the problems for what they are? Well, they are stupid idiots with skill issues who don't know anything and probably use GPT 1.0 or something.

Neither Claude nor Codex are magic silver bullets. Claude will happily reinvent any and all functions it wants, and has been doing so since the very first day it was unleashed onto the world.

> But anyways, you don't really seem like you're looking for improving, but instead try to dismiss better techniques available

Yup. Just as I said previously.

There are some magical techniques, and if you don't use them, you're a stupid Luddite idiot.

Doesn't matter that the person talking about these magical techniques completely ignores and misses the whole point of the conversation and is fully prejudiced against you. The person who needs to improve for some vague condescending definition of improvement is you.


> LLMs don't communicate.

Similarly, some humans seem to unable to too. The problem is, you need to be good at communication to effectively use LLMs, judging by this thread, it's pretty clear what the problem is. I hope you figure it out someday, or just ignore LLMs, no one is forcing you to use them (I hope at least).

I don't mind what you do, and I'm not "hyping LLMs", I see them as tools that are sometimes applicable. But even to use them in that way, you need to understand how to use them. But again, maybe you don't want, that's fine too.


"However, people who do the exact same things, use the exact tools, and see all the problems for what they are? Well, they are stupid idiots with skill issues who don't know anything and probably use GPT 1.0 or something."

Perfectly exemplified


Yeah, a summary of some imaginary arguments someone else made (maybe?), quoted back to me that never said any of those things? Fun :)


The "imaginary arguments" in question:

- "If the agent/LLM you use doesn't automatically does this, I suggest you try something better, like Codex or Claude Code."

- "you don't really seem like you're looking for improving"

- "Hopefully at least someone who wants to improve comes across it so this whole conversation wasn't a complete waste of time"

- "judging by this thread, it's pretty clear what the problem is. I hope you figure it out someday"

- "you need to understand how to use them. But again, maybe you don't want"

Aka what I said previously.

At this point, adieu.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: