Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

FWIW I didn't like the Robot / Efficient mode because it would give very short answers without much explanation or background. "Nerdy" seems to be the best, except with GPT-5 instant it's extremely cringy like "I'm putting my nerd hat on - since you're a software engineer I'll make sure to give you the geeky details about making rice."

"Low" thinking is typically the sweet spot for me - way smarter than instant with barely a delay.



I hate its acknowledgement of its personality prompt. Try having a series of back and forth and each response is like “got it, keeping it short and professional. Yes, there are only seven deadly sins.” You get more prompt performance than answer.


I like the term prompt performance; I am definitely going to use it:

> prompt performance (n.)

> the behaviour of a language model in which it conspicuously showcases or exaggerates how well it is following a given instruction or persona, drawing attention to its own effort rather than simply producing the requested output.

:)


Might be a result of using LLMs to evaluate the output of other LLMs.

LLMs probably get higher scores if they explicitly state that they are following instructions...


It's like writing an essay for a standardized test, as opposed to one for a college course or for a general audience. When taking a test, you only care about the evaluation of a single grader hurrying to get through a pile of essays, so you should usually attempt to structure your essay to match the format of the scoring rubric. Doing this on an essay for a general audience would make it boring, and doing it in your college course might annoy your professor. Hopefully instruction-following evaluations don't look too much like test grading, but this kind of behavior would make some sense if they do.


That's the equivalent of a performative male, so better call it performative model behaviour.


Pay people $1 and hour and ask them to choose A or B, which is more short and professional:

A) Keeping it short and professional. Yes, there are only seven deadly sins

B) Yes, there are only seven deadly sins

Also have all the workers know they are being evaluated against each other and if they diverge from the majority choice their reliability score may go down and they may get fired. You end up with some evaluations answered as a Keynesian beauty contest/family feud survey says style guess instead of their true evaluation.


I can’t tell if you’re being satirical or not…



jfc thank you for the context


This is even worse on voice mode. It's unusable for me now.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: