It has become common knowledge that GPT4 (and also 3.5) have problems with deter...

itsoktocry · on Aug 10, 2023

>This comment section is a super fascinating case study on the inherent flaws in human cognition. Especially when it comes to seeing patterns in random noise. The fact that some people believe that the model really has to have changed in the past few days is amazing

You need only to look at the discourse around the Tesla FSD superusers to see this: they report a glitch at an intersection one day, then believe the next day it was "fixed" by the AI.

mcbuilder · on Aug 10, 2023

Go into /r/chatgpt and /r/bing and it's a bit scary how many people anthropomorphize the models.

cosmojg · on Aug 10, 2023

Honest question, why do you find it scary?

I agree that some people take it too far, but most seem to be using metaphor to abstract away the underlying complexities and facilitate conversation. I did the same thing back in college, anthropomorphizing cells and even molecules when I was learning about microbiology and biophysics (e.g., kinesins[1] are a family of postal workers who work hard to deliver packages to their clients in a timely manner, and they spend their free time going for strolls and practicing on their favorite tightropes). I now do it in my day-to-day work at an AI/ML shop to communicate not just what the entire pipeline is doing, but what individuals layers or encoders are doing, or even what variables in an equation are doing. I find that my colleagues and I are better able to understand and remember concepts and ideas when they're communicated as part of an anthropomorphic story, but we scarcely forget that the things we're dealing with aren't human.

But maybe I'm missing the point, and you're really worried about that first, smaller group of people who have really swallowed the Kool aid. To that, I can say that I don't think their behavior exceeds the baseline craziness/weirdness that I've come to expect from humanity. I'm sure there are far more people who believe in astrology than who believe that we've achieved true humanlike artificial general intelligence, for example.

[1] https://youtu.be/y-uuk4Pr2i8

weebull · on Aug 10, 2023

> Honest question, why do you find it scary?

What I find scary is how much trust people put into the answers.

Today, I saw someone saying "Look! ChatGPT can design my home solar electrical circuit". That kind of thing will lead to new Darwin awards being given out.

Thing is, if people will trust it to do that right, when will politicians write policy papers with it? Let's face it, they already are. Other people won't want to read those long papers, so they'll ask it to summarise it for them. At that point you have LLMs writing laws and reviewing laws. It's all fine as long as nobody enforces them.

Right?

renewiltord · on Aug 10, 2023

Interesting that modern discourse is to use "scary" and "dangerous" and such words so much more. I wonder if it is related to the present rise in neuroticism and trigger warnings etc.

It's not particularly "scary" to me that people do that. I remember Boomer and Scooby Doo bots that people anthropomorphized and those were warbots from barely 10 y ago.

I suppose, in today's parlance, "it's scary how much people use fear-oriented language for normal things".

xtracto · on Aug 10, 2023

Great comment. I take it, the use of "scary" is more of a tool to provoke a stronger reaction or animosity to the comment reader, more than actually the writer feeling "scared" after seeing people anthropomorphising AI.

It seems to be the current trend in communication nowadays: A race to induce the strongest reaction possible.

Why should someone be scared of that? We have anthropomorphised chairs, brooms and whatnot since the Walt Disney times and I am sure before in classic literature (I am not that literate to know for certain).

I prefer to be 'amazed' or 'excited' about what is happening: It means that AI is getting to a point where people feel it more 'relatable'. We are getting to that point in our technology development. The number of things we will be able to do with a technology with which we can interact that seamlesly is great.

vineyardmike · on Aug 10, 2023

> Interesting that modern discourse is to use "scary" and "dangerous" and such words so much more. I wonder if it is related to the present rise in neuroticism and trigger warnings etc.

I doubt it’s trigger warnings because they usually don’t “warn scary things”.

Also is there any data to back up that scary and fear is used more today than in the past? That seems unlikely.

joshspankit · on Aug 11, 2023

> and those were warbots from barely 10 y ago.

Some people even anthropomorphized Eliza. To your point: if you cherry pick enough and survey enough people; “some people” will do just about anything.

Karawebnetwork · on Aug 10, 2023

I noticed similar a behavior in Stable Diffusion forums where people believe that the model they downloaded and are running offline is getting better at understanding their prompts.

kozikow · on Aug 10, 2023

Stable diffusion most likely don't do it, but even static model that as an input takes embedding of all your historic prompts + current prompt would get progressively give you better inputs as you use it.

joshspankit · on Aug 11, 2023

Yes, but currently that would be a conscious choice (and extra intentional effort)

olabyne · on Aug 10, 2023

Thats even worse for autonomous cars, there is so such data and noise there is no way to reproduce the issue, it's complete chaos. Whereas with a LLM if we control the seed we can 100% reproduce the same result

noduerme · on Aug 10, 2023

>> if we control the seed we can 100% reproduce the same result

No, that's the problem. You can't. You should be able to, but you can't. If you could, they wouldn't be scary. But we have Temperature Zero, different results. Because no one gave enough of a shit when coding them, and no one gives enough of a shit to try to fix the issue.

This is what in any other industry would be called gross negligence.

jquery · on Aug 10, 2023

A lot of stuff behind the scenes is going on to batch and route queries to GPT-4 models that are in perturbed states already[1]. This isn't gross negligence, this is basic capitalism. If you want sole access to a GPT-4 MoE cluster starting fresh, it's gonna cost you.

1. https://152334h.github.io/blog/non-determinism-in-gpt-4/

noduerme · on Aug 12, 2023

Interesting article. I can see how it makes sense for OpenAI or someone with a LLM to take advantage of any entropy that presented itself, as a shortcut to non-repetitive answers. I'm not sure if you're saying that these LLMs take on new characteristics as they get more randomized? Or just that it would be hard to get your hands on a fresh one to test the determinism of?

olabyne · on Aug 11, 2023

That's a OpenAi problem, not a LLM problem

sigmoid10 · on Aug 10, 2023

>Whereas with a LLM if we control the seed we can 100% reproduce the same result

No, you can't. For the latest GPT models and the way they are run, this doesn't work anymore, making the experiment completely illogical. Some of the reasons are explained here pretty well: https://152334h.github.io/blog/non-determinism-in-gpt-4/

olabyne · on Aug 11, 2023

This sounds more like a bug than a feature.

bhouston · on Aug 10, 2023

100%. This person is trying to find patterns in random noise and believes they are meaningful. The original post hurts my head with its bad logic.

imdsm · on Aug 10, 2023

I'm sorry it hurts your head. I'm happy to sponsor a packet of paracetamol or some water if that helps. Ultimately, this is fun, not science. I'm just happy that after all these attempts, it finally got to a unicorn.

Bootvis · on Aug 10, 2023

It got to a unicorn quite easily when the output was set to Tikz as I’ve done here:

https://bobjansen.net/drawing-with-chatgpt/

YeGoblynQueenne · on Aug 10, 2023

That looks nothing like a unicorn, but the three triangles at the top right in the third picture look like the mask of some evil manga villain alien robot.

YeGoblynQueenne · on Aug 10, 2023

Well done for being a good sport, but I'm willing to bet that tomorrow's shape will not resemble a unicorn and you'll have to figure out how that works with your assumption that the model is improving.

imdsm · on Aug 10, 2023

I think we're going to need to wait a few more years, at least, to see any improvement. I expect to see 4 new models a year, before GPT-5 arrives. I'll just keep using the latest model and we can all reconvene in 1, 2, 5, 10 years.

Klugistiono · on Aug 10, 2023

The 'random noise' from a prompt "Draw a unicorn in svg" should still return Unicorns.

This is absolutly fine and it should start showing unicorn like drawings over a longer period and potentially finetuned ones over a longer period of time when the model changes.

itsoktocry · on Aug 10, 2023

>The original post hurts my head with its bad logic.

Huh? What "original post"? This is an experiment, today the model drew something resembling a unicorn. Tomorrow we will see how the experiment goes again. I see no associated analysis, so what makes your "head hurt".

lcnPylGDnU4H9OF · on Aug 10, 2023

https://adamkdean.co.uk/posts/gpt-unicorn-a-daily-exploratio...

> The idea behind GPT Unicorn is quite simple: every day, GPT-4 will be asked to draw a unicorn in SVG format. This daily interaction with the model will allow us to observe changes in the model over time, as reflected in the output.

imdsm · on Aug 10, 2023

Sure, each day it will draw a unicorn. Each time the model changes, we'll have a new group of drawings. No they're not drawn at T=0 but even at T=0 GPT-4 is not deterministic. This isn't science -- this is just a bit of fun.

brabel · on Aug 10, 2023

I believe the logic is fine. You seem to think you need multiple data points from the same version of the model (i.e. multiple samples per day at least) would be necessary to judge the actual performance on each particular day.

That's worse logic. How would you visualize the very large sample you would get? Even with the current 118 samples (one per day) it's already difficult to find a pattern.

Would you "average" the samples?? That would not help IMO, you would need to average the score of each image, which requires either manually doing it or finding a reliable algorithm to do it automatically, but good luck with that.

So, a sample per day which allows clearly visualizing any change in the results over months and years is a valuable thing to do and I find it hard to improve on the methodology. You just need to keep in mind that one single picture from the sample is not enough, no one is going to disagree with that... but that doesn't make it "bad logic" and it's pretty thoughtless to say so.

bhouston · on Aug 10, 2023

I agree that trying to determine the distribution of these drawings is hard because it isn't a simple floating point number in its current form.

But maybe you could covert it to a linear monotonic measure?

You could pass it to an image recognition model and see record the degree to which it thinks it is an:

animal horse unicorn

Basically if it fails to be a unicorn, see if it is a horse and if it fails to be a horse check if it is an animal. This gives you some type of linear measure if you place these three measures along the same axis adjacently. Then you can transform each image to a floating point number and then characterize the distribution.

itsoktocry · on Aug 10, 2023

>You could pass it to an image recognition model and see record the degree to which it thinks it is...

We don't care what another algorithm "thinks". We want to see if what it draws is humanly interpretable as a unicorn.

bhouston · on Aug 10, 2023

Then you could use mechanical turk to have them rate each image to figure out how close it is to a Unicorn...

imdsm · on Aug 10, 2023

We could but this is also a fun project, which is why when I checked it today I was surprised that what I saw was not a turd with eyes (2023-05-18) nor a strange sea creature (2023-07-08) but something which, for the first time I think, actually resembled a unicorn.

I appreciate all the comments around determinism, sampling, scientific method, but as I said when I posted this just after building, it really is just for fun and to see, over time, if the general mish mash of outputs become more refined without any changes to the prompt (which doesn't aid it through CoT/ToT or improving on previous attempts etc.)

civilitty · on Aug 10, 2023

You don’t have to justify yourself to the HN peanut gallery :-)

butler14 · on Aug 10, 2023

Mechanical turk 'workers' use ChatGPT

ActivePattern · on Aug 10, 2023

Then simply train the model to predict whether a human can interpret it as a unicorn.

nurettin · on Aug 10, 2023

What if, instead of your ridiculous strawman, they believe in waiting to get consistency?

imdsm · on Aug 10, 2023

I could also believe in not having to have beliefs and just letting it run until I either die or run out of money, and the former may not result in an immediate shuttering of the project either

throwaway154 · on Aug 10, 2023

Expecting a convergent series is bad logic.

calderknight · on Aug 10, 2023

I don't see the bad logic.

bhouston · on Aug 10, 2023

He said he is "Asking GPT-4 to draw a unicorn every day to track changes in the model."

The variance he is seeing in the output is primarily the product of random chance, rather than changes in the model. Specifically this "unicorn" that he found today is likely just random chance and there was no changes in the model between yesterday and today that lead to it arising.

If he wanted to track changes in the model for real, he would have to ask multiple questions per day and try to infer some type of distribution characterization and then see if that changes over time. That is much more complex and not what he is doing.

This is just a curious experiment that doesn't mean much.

addandsubtract · on Aug 10, 2023

There's a linked blog post[0] that goes more into the methodology and reasoning.

"As mentioned in the hacker news discussion, the model doesn't change daily. [...] As OpenAI releases incremental updates, we'll see the model change automatically and be able to judge outputs. A single sample per day leads to quite different results, but that's fine I think. What I expect to see a year from now is an evolution of output. In variance: how varied are the outputs over a month?"

So yes, they could just produce 100 images with each new model release, but chose to spread those out over 1 per day instead. Is it the most scientific way to measure progress? No. Is it more fun and interesting to check back daily? Probably.

[0] https://adamkdean.co.uk/posts/gpt-unicorn-a-daily-exploratio...

imdsm · on Aug 10, 2023

Thanks, fun and light-hearted is the approach I went for

vasco · on Aug 10, 2023

If you look at the examples from April, only 2 or 3 can be counted as unicorns. If in 3 months it's the reverse and only 2 or 3 can't be counted as unicorns, that would show a progressive improvement in the model. I agree we shouldn't take much from day N-1 to day N as there will be a lot of variance, but this can show us progressive improvement over model updates.

Perhaps in a few months half of the generated pictures will look like unicorns, perhaps in more months they will all be unicorns but 2 or 3 will look way more detailed instead of drawn by a 4 year old, etc. We just need to wait longer for the signal to break through the noise.

bhouston · on Aug 10, 2023

I do agree with your comment, especially this part:

> We just need to wait longer for the signal to break through the noise.

Currently, what we are observing is primarily noise with very little signal.

fsloth · on Aug 10, 2023

I don't think anyone claims this is an iterative linear measure, rather than a step function.

SVG can present arbitrarily complex graphics. The underlying display tech supports what ever fidelity GPT will eventually mature into.

Has GPT plateaud? Will it be stuck forever at this hilariously naive level of competence at SVG art? Will it mature into Midjourney level competence? I have no frigging clue. Since the token context is so small I imagine it will put limitations to the complexity of SVG art piece.

But I don't know. And it's fun to have a daily measure.

As a software engineer with a penchant for graphics asking GPT to draw complex graphic shapes was one of the first tests I did for it. It's extremely interesting for me to collect progress data, no matter how noisy.

I have no idea if GPT will ever mature beyond these squiggles but if it does, this track record will have at least considerable artistic value, if nothing else.

jquery · on Aug 10, 2023

I mean, how many humans can draw art by writing out svg? If that's not in the training set, I don't even see how GPT-4 gets much better at this over time.

imdsm · on Aug 10, 2023

And so, if we see that it /does/ get better, over the next few years, will that not lead us to ask /how/?

Let's think about it:

1. It has to output SVG [1] 2. It is given a text based representation of what it must draw[2] 3. It must then somehow convert words -- the concept of a unicorn: equine with a horn, white, maybe rainbows? -- into SVG code, and attempt to convey both their location, shape, colour, appearance, with code.

And keep in mind, this is just a token predictor. I doubt there is much data in its training that is this specific.

So while it's quite far from science, for me, it's a bit of fun and I get emails every now and then remarking on things like the turd of May (2023-05-18) and it lightens the mood every now and then, which I think ultimately, is worth it.

[1] System: You are a helpful assistant that generates SVG drawings. You respond only with SVG. You do not respond with text.

[2] User: Draw a unicorn in SVG format. Dimensions: 500x500. Respond ONLY with a single SVG string. Do not respond with conversation or codeblocks.

See: https://github.com/adamkdean/gpt-unicorn/blob/master/src/lib...

fsloth · on Aug 11, 2023

GPT-4 is really great at transferring concepts between domains.

That's one the reasons why GPT when it works, feels magical. SVG art does not need to be in it's training set, as long as it knows how to present geometric concepts in SVG.

A good unicorn would require capabilities something like "the outline of unicorn is composed of lines {...}." -> "export lines as svg".

jquery · on Aug 11, 2023

Ok that makes perfect sense, thanks.

calderknight · on Aug 10, 2023

It takes a picture from GPT every day, so we'll be able to see if this was a fluke by looking at future days' outputs. I think it will work to track changes in the model.

bhouston · on Aug 10, 2023

Yes, this makes sense. You are agreeing with me. In order to see if it is just a fluke or not, you need a lot of samples so you can characterize the distribution yourself and try to see if it changed.

calderknight · on Aug 10, 2023

OK but I'm still not seeing any logic error.

bhouston · on Aug 10, 2023

That a single sample per day can track changes meaningfully when the noise floor is above the signal strength. And also the fact that today's unicorn-ish sample is meaningful at all.

It is a fun experiment though.

MauranKilom · on Aug 10, 2023

If you look closely, you can see that the model version is attached to each image, and the original blog post clearly states that asking the same model over multiple days is how they intend to track the variance. No bad logic there imo.

Teever · on Aug 10, 2023

you're looking at it from the wrong time scale.

The difference over months or years is what is interesting.

sheepscreek · on Aug 10, 2023

Agree with this except for one data point. OpenAI does enhance/tweak/do something with the models at different levels. This can be determined by:

1. A change in the current model number (eg. gpt-3.5-turbo-0613)

2. On ChatGPT UI, the date at the bottom (eg. August 2023)

So it isn’t correct to say “it is incredibly obvious that nothing has happened”. Not that obvious to me.

A bit like how you can never tell for sure if Coca Cola has tweaked their formula, or McDonalds has changed the recipe for its signature sauce. Only in this case, the model number going up or the date becoming more recent leads credence to something having changed.

sigmoid10 · on Aug 10, 2023

The ChatGPT UI is indeed a wildcard. But it is irrelevant here because according to the github repo this page queries the API and OpenAI guarantees it doesn't change models with version number information (like gpt-4-0613, which is mentioned in the latest images). So this "experiment" would make a lot more sense if it was only run once every few months when the API actually offers new model versions and then generate a bunch of images for every model, instead of generating one single image every day (which is meaningless due to non-deterministic noise, even if the model had somehow changed since yesterday). That is also how it was done in the original study during the development of GPT4. I don't know how this experiment came up with its logic, unless they had a gross misunderstanding about how these models actually work (which admittedly seems common among tech interested folks here).

imdsm · on Aug 10, 2023

When I last spoken to Logan, he confirmed that there are no changes between the API models, so 0314 and 0613 are it. That's two models so far that I've collected SVGs for. With regard to batch vs daily -- it makes no different in terms of output, but by going daily, I don't need to track model changes and generate a new batch.

Also it's fun to see each daily unicorn.

sheepscreek · on Aug 10, 2023

That’s a good summary and it makes a lot of sense.

motoboi · on Aug 10, 2023

Agreed. But have you seen the original talk?

I believe he's trying to find an unicorn similar in style to the one generated by the original researcher.

It's so sad that openai has a far more capable model internally that it can't give open access to because of safety (or any other argument).

ActivePattern · on Aug 10, 2023

I suspect the model is fundamentally the same underneath, but that various tricks like quantization are being performed in the deployed model to improve inference speed/cost at the expense of output quality.

christkv · on Aug 10, 2023

Is it possible that inference cost is so high it’s viable?

imdsm · on Aug 10, 2023

Bear in mind I chose SVG rather than TiKZ

MauranKilom · on Aug 10, 2023

Yes, nothing about GPT4 changed today. But that's not the goal of the project (although I can't speak for the intentions of the submitter here).

Currently there are two different GPT4 models represented in the samples, with quite significant quality difference between them. The quality (and variance in quality within a single model!) is interesting to see in such a comparison.

imdsm · on Aug 10, 2023

> But that's not the goal of the project (although I can't speak for the intentions of the submitter here).

(submitter here) You're correct, it's not the goal of the project. It would be fair to say there is no goal other than to ask GPT to draw a unicorn every day, and through it, create a talking point and potential fun for people who follow along.

sigmoid10 · on Aug 10, 2023

This variance also exists among outputs from the same model. Just scroll down a bit and you'll see drastic quality differences with the exact same model.

pietroppeter · on Aug 10, 2023

Is there some reference or explanation on why the model it is non deterministic at temperature 0?

luc4sdreyer · on Aug 10, 2023

I'm not aware of anything concrete by OpenAI, but others have offered possible explanations.

One idea is that the cause is batched inference in sparse MoE (mixture of experts) models.

https://152334h.github.io/blog/non-determinism-in-gpt-4/

HN discussion: https://news.ycombinator.com/item?id=37006224

davrosthedalek · on Aug 10, 2023

So in some sense the spectre attack for AI?

awestroke · on Aug 10, 2023

rav · on Aug 10, 2023

One important source of non-determinism is from using massive parallelism together with floating point arithmetic. In real math, a sum of numbers has an exact value that doesn't change if you change which order the numbers are added up in, but floating point arithmetic addition is not associative in the same way as real math, and parallelism can cause numbers to be added in a different order from execution to execution, which is one cause of non-determinism.

imtringued · on Aug 10, 2023

>and parallelism can cause numbers to be added in a different order from execution to execution

Parallelism doesn't magically add non-determinism of this kind unless you intentionally build it to be non deterministic. Nothing prevents you from processing an array in order in parallel.

kykeonaut · on Aug 10, 2023

However, the poster mentions parallelism in conjunction with floating point arithmetic, not parallelism by itself.

tmearnest · on Aug 10, 2023

No. The problem is in a reduction op of some sort (sum or whatever). Since there no guarantee of the order you receive the terms for the reduction, the nondeterminism enters from order of terms reduced. Since float math isn't associative, there will be slight differences depending on the order and these can amplify quickly over a deep net.

You would have to explicitly order the terms prior to reduction but you don't always have that level of control.

fl7305 · on Aug 10, 2023

> Nothing prevents you from processing an array in order in parallel.

100% correct if you remove processing time from the equation.

In reality, Nvidia Cuda calculations run much faster if you let it schedule the order of floating points operations itself. This makes the ordering different from run to run.

This in turn causes the results to be non-deterministic.

Frummy · on Aug 10, 2023

https://news.ycombinator.com/item?id=37006224

pietroppeter · on Aug 10, 2023

Thanks!

KoolKat23 · on Aug 10, 2023

If he removed the word changes it would've made sense. See what the odds are of it producing a unicorn. So far it's roughly 1 in 118 based on 1 test a day.

rcme · on Aug 10, 2023

I'm not sure what you mean by random sampling. If I sample a random SVG, I wouldn't expect to look like anything, let alone roughly like a unicorn.

sigmoid10 · on Aug 10, 2023

I mean random sampling in the sense how autoregressive language models like GPT generate sequences using token probabilities. It's not a random svg, but the text sequence that is used to draw it suffers from inherent non-determinism in the underlying model.

rcme · on Aug 10, 2023

Relying on token probabilities seems like the exact opposite of random

sigmoid10 · on Aug 12, 2023

The neural network just generates a set of probabilities for all tokens. The actual next token is then sampled from this set, which is always random for T>0 (and in the case of GPT4 even for T=0, because of the way the model itself works).

esafak · on Aug 10, 2023

"Random" is a loose term. We're discussing a technical subject. Talk about the distribution the random numbers are sampled from.

throwuwu · on Aug 10, 2023

If the model understood the spacial relationships as well as the one that produced the original drawings of a unicorn then variance in the choice of the next token should produce many similar but somewhat different images of unicorns. None of the images until today bear any resemblance to the original images.

imdsm · on Aug 10, 2023

What I find interesting is that there are aspects of what a "unicorn" represents, horns, limbs etc, in some of the drawings. Today is the first time I've seen it create something that actually looks like a representation of a unicorn.

noduerme · on Aug 10, 2023

I have not kept up with GPT architecture, other than noticing that other people have noticed that T=0 is clearly not deterministic for these things (and that that results from a bug, not an intentional feature). This much was obvious when the supposedly genius idiots rolled out GPT-3. It's wonderful to see the whole world bend over and just take it up the ass from a bunch of people who can't figure out why their code can't produce the same result twice; but it's quite natural for there to be a big cheerleading section on HN for any new technology that's (1) brilliant in theory, (2) deeply anti-human in practice, and (3) just needs a couple more revisions before it "fixes" a bunch of stuff.

fl7305 · on Aug 10, 2023

> people who can't figure out why their code can't produce the same result twice

It is well known that the Nvidia parallel processing optimizations cause non-deterministic results.

It's easy to get deterministic results as far as that goes. They've just elected not to do that, since it would run much slower.

QuadmasterXLII · on Aug 10, 2023

We’re pretty sure the nondeterminism is batching + mixture of experts + contention for specific experts

noduerme · on Aug 10, 2023

If by batching you mean bad code that fails to sort or relies on hardware to best-guess how things sort, then sure, that's called a bug. Also, "we're pretty sure" is rather self-important while also admitting total, abject failure to produce a deterministic result. You shouldn't blame yourself. A lot of people had the same feeling after staking their life on the revolutionary properties of NFTs.

n2d4 · on Aug 10, 2023

Your incorrect assumption here is that determinism comes with no tradeoffs. There are a few outsider analyses on the topic, for example, this one on sparse MoEs [1]. If OpenAI uses sparse MoE as described, then determinism would be possible but inefficient.

Even if it's not sparse MoE, chances are high that the non-determinism is introduced somewhere purely as a performance optimization. The article speculates that OpenAI knows this well and hides it to protect the model internals.

[1] https://152334h.github.io/blog/non-determinism-in-gpt-4/

worldsayshi · on Aug 10, 2023

Even if a single version of gpt4 would be deterministic any change done to the model would probably introduce enough noise to make it impossible to make any conclusions on a few samples?

goodpoint · on Aug 10, 2023

> This comment section is a super fascinating case study on the inherent flaws in human cognition

Like most of HN.

asddubs · on Aug 10, 2023

yeah, see also image-2023-04-25 which is way earlier and comes really close, surrounded by garbage