Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm confused as to why this would see any improvement over time. Looking at the code, it's by default hitting the gpt 3.5-turbo API. Maybe I'm misremembering, but I thought I've seen statements from people working at OpenAI where it's been claimed that the API is static, we'd be informed of any changes to the underlying model. Is the model actually receiving updates?

edit: Looking at previous days, too, it doesn't exactly seem to be improving. I think we just got a lucky sampling.



Yes, the models are updated officially around every three months, with a notice you can still use the previous version for a time until it is decommissioned.

Some people claim there are also unannounced changes, but I can't vouch for that.

The daily variation is likely due to temperature. To make the response less repetitive.


Wasn't there a study recently that tracked the performance of GPT over time and found significant drop in quality? Did those drops occur at official model changes, or at other times? (i.e. unannounced changes for safety or cost reduction)

I mean, if I was OpenAI, I probably wouldn't make an announcement like "we've just quantized the model and increased our profit margins significantly! The only change on your end will be a slightly dumber model. (Don't worry! Most users won't even notice!)"


This one [1]? That tracks two distinct versions (0613 vs. 0314).

Also, IMO, the tasks they evaluate aren't useful (I rarely want my LLM to tell me whether 17077 is a prime number), and there's room for cherrypicking/survivorship bias. My guess is that OpenAI did something between 0314 and 0613 that shifted focus away from maths to other subjects.

[1] https://arxiv.org/pdf/2307.09009.pdf


I haven't tracked the quality but I do track the performance:

https://gpt-monitor.adamkdean.co.uk/

It fluctuates a lot but you can see trends.


The site linked in the OP is interesting because it takes a picture from GPT every day, so we can see for ourselves if there is any difference with time. We can come back tomorrow and see what it has produced. If it has produces random squigly lines again, we might assume that today's success was just a fluke.



It’s using GPT-4 by default, but we can’t know what it uses for real since that’s in the environment config.


It says "gpt-4-0613" on the web page as well. Why would they pretend it's GPT-4 but use GPT-3.5 in the background?


Submitter here. It's using gpt-4 and saving the model that OpenAI returns, which helps us see the specific model that is used each time.

    "Env": [
        "VIRTUAL_HOST=gpt-unicorn.adamkdean.co.uk",
        "LETSENCRYPT_HOST=gpt-unicorn.adamkdean.co.uk",
        "HTTP_PORT=8000",
        "STORAGE_PATH=/data",
        "OPENAI_API_KEY=sk-**SNIP**",
        "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
        "NODE_VERSION=16.17.1",
        "YARN_VERSION=1.22.19"
    ],


I’m not saying they do, just that you can’t know from the code alone.


According to the author's blog post [1] the idea was that it "will use the latest gpt-4 model made available". Not sure if the code isn't up to date or this was changed in the meantime...

[1] https://adamkdean.co.uk/posts/gpt-unicorn-a-daily-exploratio...


No changes - it specifies gpt-4 which tracks the latest model


The API layer receives fairly regular updates, but the model is (as I understand it) mostly static.

Within GPT there is an intentional randomness element called temperature which is how you get different answers each time.

I could copy their prompt and ask GPT4 to draw other things, but I’ll probably just look at the next few unicorns from this site :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: