Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Just using common sense, if we had a genius, who had tremendous reasoning ability, total recall of memories, and an unlimited lifespan and patience, and he'd read what the current LLMs have read, we'd expect quite a bit more from him than what we're getting now from LLMs.

There are teenagers that win gold medals on the math olympiad - they've trained on < 1M tokens of math texts, never mind the 70T tokens that GPT5 appears to be trained on. A difference of eight orders of magnitude.

In other words, data scarcity is not a fundamental problem, just a problem for the current paradigm.



I think quantization is the simplest canary.

If we can reduce the precision of the model parameters by 2~32x without much perceptible drop in performance, we are clearly dealing with something wildly inefficient.

I'm open to the possibility that over parameterization is essential as part of the training process, much like how MSAA/SSAA over sample the frame buffer to reduce information aliasing in the final scaled result (also wildly inefficient but very effective generally). However, I think for more exotic architectures (spiking / time domain) these rules don't work the same way. You can't back propagate a recurrent SNN so much of the prevailing machine learning mindset doesn't even apply.


It’s not clear that the inefficiency of the current paradigm is in the neural net architectures. It seems just as likely that it’s in the training objective.


Right. The objective is "correctly predict the entire training set", where that training set contains literally everything. So the objective becomes to speak every human language, every programming language, to understand every topic, to master every weird sub-genre of culture. That's an inherently very inefficient training objective if you just want an AI that can do some specific tasks. It's the whole insight behind models specific to summarization, text extraction, patch merging etc.

And don't forget the noise. If you look at the Anthropic papers it's clear from the examples they give that the dataset is still incredibly noisy even after extensive cleaning efforts. A lot of those parameters are being wasted trying to predict garbage outputs from HTML scraping gone wrong.


Now consider that the genius cannot physically interact with the world or the people therein, and uses her eyes only for reading text.


Yes - we train only on a subset of human communication, the one using written symbols (even voice has much much more depth to it), but human brains train on the actual physical world.

Human students who only learned some new words but have not (yet) even began to really comprehend a subject will just throw around random words and sentences that sound great but have no basis in reality too.

For the same sentence, for example, "We need to open a new factory in country XY", the internal model lighting up inside the brain of someone who has actually participated when this was done previously will be much deeper and larger than that of someone who only heard about it in their course work. That same depth is zero for an LLM, which only knows the relations between words and has no representation of the world. Words alone cannot even begin to represent what the model created from the real-world sensors' data, which on top of the direct input is also based on many times compounded and already-internalized prior models (nobody establishes that new factory as a newly born baby with a fresh neural net, actually, even the newly born has inherited instincts that are all based on accumulated real world experiences, including the complex very structure of the brain).

Somewhat similarly, situations reported in comments like this one (client or manager vastly underestimating the effort required to do something): https://news.ycombinator.com/item?id=45123810 The internal model for a task of those far removed from actually doing it is very small compared to the internal models of those doing the work, so trying to gauge required effort falls short spectacularly if they also don't have the awareness.


Also the geniuses get beaten with a stick if they don't memorize and perfectly reproduce the text they've read.


I'm not sure what point you are trying to make. Are you saying in order to make LLMs better at learning the missing piece is to make the capable to interact with the outside world? Give them actuators and sensors?


> they've trained on < 1M tokens of math texts, never mind the 70T tokens that GPT5 appears to be trained on.

Somewhat apples and oranges given billions of years of evolution behind that human. GPT-5 started off as a blank slate.


This comparison is absolute nonsense.

"How could a telescope see saturn, human eyes have billions of years of evolution behind them, and we only made telescopes a few hundred years ago, so they should be much weaker than eyes"

"How can StockFish play chess better than a human, the human brain has had billions of years of evolution"

Evolution is random, slow, and does not mean we arrive at even a local optima.


They're not saying that LLMs should be better than smart teenagers; they're saying that smart teenagers can solve some problems without needing massive amounts of data, so apparently those problems are technically solvable without those amounts of data.


Yes. It is astonishing that LLMs can solve problems that only a handful of very smart teenagers can solve, but LLMs do it by consuming a million times as much content as those teenagers. Running out of data is not a reason for despair.

Also consider that during training LLMs spend much less time on processing, say, TAOCP (Knuth), or SICP (Abelson, Sussman, and Sussman), or Probability Theory (Jaynes) than on the entirety that is r/Frugal.

20 thick books turn a smart teenager into a graduate with a MSc. That's what, 10 million tokens?

When we read difficult, important texts, we reflect on them, make exercises, discuss them, etc. We don't know how to make an LLM do that in a way that improves it. Yet.


What comparison? I was arguing against a comparison.


To be fair, GPT-5 didn't start off as a blank slate. The architecture probably encodes a lot, much like how DNA encodes a lot. The former requires human writing to decompress into a human-like thing, the latter requires the Earth environment and a woman to decompress into a human organism.

But it's indeed apples and oranges. There's no good way to estimate the information encoded by the GPT architecture compared to human DNA. We just have to be empirical and look at what the thing can do.


[flagged]


Neural precursor cells literally move themselves from where they first differentiate to their final location to ensure specific neural structures and information dynamics in the developed brain. It's not declarative memory, but its a memory of the neural architecture etched out over evolutionary time.


They're born with neural hardware whose architecture has been optimized by evolution. Any choice of architecture imparts some inductive bias, making some problems easier and some problems harder to learn, and humans have the advantage that people with bad architectures (those not matching properties of the world we live in) were more likely to die or to not mate.

You're right that we don't call those inherited thought patterns memories; we call them reflexes, emotions, region-specific brain functions, etc.


This is such a wildly misleading statement that it borders on straight up incorrect.


you mean its wrong? or would that not have made you feel as clever?


Maybe human brains are constantly generating (and training on) massive amounts of synthetic data and that is how they get so smart?


You mean those like 8 hours of ~~nightmares~~ dreams I have every night?


I doubt it. Brains run at only a few operations per second.... GPUS at TFLOPS. There just isn't enough bandwidth.

My brain only needs to get mugged in a dark alley by a guy in a hoodie once to learn something.


This sentence really struck me in a particular way. Very interesting. It does seem like thoughts/stream of consciousness is just your brain generating random tokens to itself and learning from it lol.


What experiment could be run to test this hypothesis?


Humans are not tabulae rasae though. Evolution has hardwired our geniosity over millions of years.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: