Hacker Newsnew | past | comments | ask | show | jobs | submit | ForceBru's commentslogin

You're probably thinking of another post (https://xania.org/202512/11-pop-goes-the-weasel-er-count) where an entire loop was optimized to a single instruction

Care to elaborate? Why do you think it's bullshit?

> Compared to humans, LLMs have effectively unbounded training data. They are trained on billions of text examples covering countless topics, styles, and domains. Their exposure is far broader and more uniform than any human's, and not filtered through lived experience or survival needs.

I think it's the other way round: humans have effectively unbounded training data. We can count exactly how much text any given model saw during training. We know exactly how many images or video frames were used to train it, and so on. Can we count the amount of input humans receive?

I can look at my coffee mug from any angle I want, I can feel it in my hands, I can sniff it, lick it and fiddle with it as much as I want. What happens if I move it away from me? Can I turn it this way, can I lift it up? What does it feel like to drink from this cup? What does it feel like when someone else drinks from my cup? The LLM has no idea because it doesn't have access to sensory data and it can't manipulate real-life objects (yet).


A big challenge is that the LLM cannot selectively sample it's training set. You don't forget what a coffee cup looks like just because you only drank water for a week. LLMs on the other hand will catastrophically forget anything in their training set when the training set does not have a uniform distribution of samples in each batch.

This is a fair criticism we should've addressed. There's actually a nice study on this: Vong et al. (https://www.science.org/doi/10.1126/science.adi1374) hooked up a camera to a baby's head so it would get all the input data a baby gets. A model trained on this data learned some things babies do (eg word-object mappings), but not everything. However, this model couldn't actively manipulate the world in the way that a baby does and I think this is a big reason why humans can learn so quickly and efficiently.

That said, LLMs are still trained on significantly more data pretty much no matter how you look at it. E.g. a blind child might hear 10-15 million words by age 6 vs. trillions for LLMs.


> hooked up a camera to a baby's head so it would get all the input data a baby gets.

A camera hooked up to the baby's head is absolutely not getting all the input data the baby gets. It's not even getting most of it.


> LLMs are still trained on significantly more data pretty much no matter how you look at it ... 10-15 million words ... vs trillions for LLMs

I don't know how to count the amount of words a human encounters in their life, but it does seem plausible that LLMs deal with orders of magnitude more words. What I'm saying is that words aren't the whole picture.

Humans get continuous streams of video, audio, smell, location and other sensory data. Plus, you get data about your impact on the world and the world's impact on you: what happens when you move this thing? What happens when you touch some fire? LLMs don't have this yet, they only have abstract symbols (words, tokens).

So when I look at it from this "sensory" perspective, LLMs don't seem to be getting any data at all here.


While an LLM is trained on trillions of tokens to acquire its capabilities, it does not actively retain or recall the vast majority of it, and often enough is not able to make deductive reasoning either (e.g. X owns Y does not necessarily translate to Y belongs to X).

The acquired knowledge is a lot less uniform than you’re proposing and in fact is full of gaps a human would never make. And more critically, it is not able to peer into all of its vast knowledge at once, so with every prompt what you get is closer to an “instance of a human” than “all of humanity” as you might think of LLMs.

(I train and dissect LLMs for a living and for fun)


I think you are proposing something that's orthogonal to the OP's point.

They mentioned the training data is much higher for an LLM, LLM's recall not being uniform was never in question.

No one expects compression to be without loss when you scale below knowledge entropy that exists in your training set.

I am not saying LLMs do simple compression but just pointing a mathematical certainity.

(And I think you don't need to be an expert in creating LLMs to understand them, albeit I think a lot of people here have experience with it aswell so I find the additional emphasis on it moot).


The way I understood OP’s point is that because LLMs have been trained on the entirety of humanity’s knowledge (exemplified by the internet), then surely they know as much as the entirety of humanity. A cursory use of an LLM shows this is obviously not true, but I am also raising the point that LLMs are only summoning a limited subset of that knowledge at a time when answering any given prompt, bringing them closer to a human polymath than an omniscient entity, and larger LLMs only seem to improve on the “depth” of that polymath knowledge rather than the breadth of it.

Again just my impression from exposure to many LLMs at various states of training (my last sentence was not an appeal to expertise)


There's only so much information content you can get from a mug though.

We get a lot of high quality data that's relatively the same. We run the same routines every day, doing more or less the same things, which makes us extremely reliable at what we do but not very worldly.

LLMs get the opposite: sparse, relatively low quality, low modality data that's extremely varied, so they have a much wider breadth of knowledge but they're pretty fragile in comparison since they get relatively little experience on each topic and usually no chance to affirm learning with RL.


Yep, LLMs have a greater breadth of knowledge, but it's shallow. Humans are able to achieve much greater depth because they have more data about the subject.

Not only that, but humans also have access to all of the "training data" of hundreds of millions of years of evolution baked into our brains.

I don’t think the amount of data is essential here. The human genome is only around 750 MB, much less than current LLMs, and likely only a small fraction of it determines human intelligence. On the other hand, current LLMs contain immense amounts of factual knowledge that a human newborn carries zero information about.

Intelligence likely doesn’t require that much data, and it may be more a question of evolutionary chance. After all, human intelligence is largely (if not exclusively) the result of natural selection from random mutations, with a generation count that’s likely smaller than the number of training iterations of LLMs. We haven’t found a way yet to artificially develop a digital equivalent effectively, and the way we are training neural networks might actually be a dead end here.


That just says "low Kolmogorov complexity". All the priors humans ship with can be represented as a relatively compact algorithm.

Which gives us no information on computational complexity of running that algorithm, or on what it does exactly. Only that it's small.

LLMs don't get that algorithm, so they have to discover certain things the hard way.


Which must be doing some heavy lifting.

Humans ship with all the priors evolution has managed to cram into them. LLMs have to rediscover all of it from scratch just by looking at an awful lot of data.


OTOH, all that data is built on patterns that evolved from many years of evolution, so I think the LLM benefits from that evolution also.

Sure, but LLMs are trying to build the algorithms of the human mind backwards, converge on similar functionality based on just some of the inputs and outputs. This isn't an efficient or a lossless process.

The fact that they can pull it off to this extent was a very surprising finding.


It’s unlikely sensory data contributes to intelligence in human beings. Blind people take in far, far less sensory data than sighted people, and yet are no less intelligent. Think of Helen Keller - she was deafblind from an early age, and yet was far more intelligent than the average person. If your hypothesis is correct, and development of human intelligence is primarily driven by sensory data, how do you reconcile this with our observations of people with sensory impairments?

Blind people tend to have less spatial intelligence though, like significantly more. Not very nice to say like that, and of course they often develop heightened intelligence in other areas, but we do consider human-level spatial reasoning a very important goal in AI.

People with sensory impairments from birth may be restricted in certain areas, on account of the sensory impairment, but are no less generally cognitively capable than the average person.

> but are no less generally cognitively capable than the average person

I think this would depend entirely on how the sensory impairment came about, since most genetic problems are not isolated, but carry a bunch of other related problems (all of which can impact intelligence).

Lose your eye sight in an accident? I would grant there is likely no difference on average.

Otherwise, the null hypothesis is that intelligence (and a whole host of other problems) are likely worse, on average.


> It’s unlikely sensory data contributes to intelligence in human beings.

This is clearly untrue. All information a human ever receives is through sensory data. Unless your position is that the intelligence of a brain that was grown in a vat with no inputs would be equivalent to that of a normal person.

Now, does rotating a coffee mug and feeling its weight, seeing it from different angles, etc. improve intelligence? Actually, still yes, if your intelligence test happens to include questions like “is this a picture of a mug” or “which of these objects is closest in weight to a mug”.


>Unless your position is that the intelligence of a brain that was grown in a vat with no inputs would be equivalent to that of a normal person.

Entirely possible - we just don’t know. The closest thing we have to a real world case study is Helen Keller and other people with significant sensory impairments, who are demonstrably unimpaired in a general cognitive sense, and in many cases more cognitively capable than the average unimpaired person.


I think you are trying to argue for a very abstract notion of intelligence that is divorced from any practical measurement. I don’t know how else to interpret your claim that inputs are divorced from intelligence (and that we don’t know if the brain in a jar is intelligent).

This seems like a very philosophical standpoint, rather than practical. And I guess that’s fine, but I feel like the implication is that if an LLM is in some way intelligent, then it was exactly as intelligent before training. So we are talking about “potential intelligence“? Does a stack of GPU’s have “intelligence”?


Intelligence isn’t rigorously defined or measurable, so any conversation about the nature of intelligence will be inherently philosophical. Like it or not, intelligence just is an abstract concept.

I’m trying to illustrate that the constraints that apply to LLMs don’t necessarily apply to humans. I don’t believe human intelligence is reliant upon sensory input.


It can’t be both. If intelligence is this abstract and philosophical then the claims about inputs not being relevant for human intelligence are meaningless. It’s equally meaningless to say that constraints on LLM intelligence don’t apply to human intelligence. In the absence of a meaningful definition of intelligence, these statements are not grounded in anything.

The term cannot mean something measurable or concrete when it’s convenient, but be vague and indefinable when it’s not.


> Gaussian mixture models

In what fields did neural networks replace Gaussian mixtures?


The acoustic model of a speech recognizer used to be a GMM, which mapped a pre-processed acoustic signal vector (generally MFCCs-Mel-Frequency Cepstral Coefficients) to an HMM state.

Now those layers are neural nets, so acoustic pre-processing, GMM, and HMM are all subsumed by the neural network and trained end-to-end.

One early piece of work here was DeepSpeech2 (2015): https://arxiv.org/pdf/1512.02595


Interesting, thanks!

Is this new or somehow updated? HTML versions of papers have been available for several years now.

EDIT: indeed, it was introduced in 2023: https://blog.arxiv.org/2023/12/21/accessibility-update-arxiv...


From the paper...

Why "experimental" HTML?

Did you know that 90% of submissions to arXiv are in TeX format, mostly LaTeX? That poses a unique accessibility challenge: to accurately convert from TeX—a very extensible language used in myriad unique ways by authors—to HTML, a language that is much more accessible to screen readers and text-to-speech software, screen magnifiers, and mobile devices. In addition to the technical challenges, the conversion must be both rapid and automated in order to maintain arXiv’s core service of free and fast dissemination.


No I mean _arXiv_ has had experimental support for generating HTML versions of papers for years now. If you visit arXiv, you'll see a lot of papers have generated HTML alongside the usual PDF, so I'm trying to understand whether the article discussed any new developments. It seems like it's not new at all


It's kind of fun to compare this formulation with the seemingly contradictory official arXiv argument for submitting the TeX source [1]:

> 1. TeX has many advantages that make it ideal as a format for the archives: It is plain text, it is compact, it is freely available for all platforms, it produces extremely high-quality output, and it retains contextual information.

> 2. It is thus more likely to be a good source from which to generate newer formats, e.g., HTML, MathML, various ePub formats, etc. [...]

Not that I disagree with the effort and it surely is a unique challenge to, at scale, convert the Turing complete macro language TeX to something other than PDF. And, at the same time, the task would be monumentally more difficult if only the generated PDFs were available. So both are right at the same time.

[1] https://info.arxiv.org/help/faq/whytex.html#contextual


Working with both at the same time makes their strengths and pitfalls shine. It's like that dual-boot computer where you're constantly in the wrong OS.

HTML has better separation of concerns than latex. Latex does typesetting a lot better than html. HTML layout can differ wildly in the same document. Latex documents are easier to layout in the first place.

...etc...


There are pretty often problems with figure size and with sections being too narrow or wide (for comfortable reading). The PDF versions are more consistently well-laid-out.



Unfortunately, the whole point is that along with the fridge/whatever tech you purchase a billboard and willingly bring ads into your home. Of course ads on purchased devices should be mandatory AND we customers will soon be expected to pay a "subscription fee" to temporarily unsubscribe from the ads. What kind of company would possibly make ads opt-in? IMO allowing the owner to turn off ads is a problem (for the company), not a solution


> What kind of company would possibly make ads opt-in?

Amazon has for years: Kindle with ads on the lock screen is $20 cheaper than without: https://www.amazon.com/All-new-Amazon-Kindle-Paperwhite-glar...


That's fine. They can simply charge for the product what it costs to make, like they always did before, and if they find that nobody uses the "enable ads" button (because why would they?) they can save some maintenance effort by removing that button. They might even find the fridge doesn't need a wifi chip and can be cheaper.


Most products are not priced to cost.


When it's a competition among individual producers, we call it "a free market" and praise Hal Varian. When it's a competition among countries, it's suddenly threatening to "disrupt American AI markets and efforts". The obvious solution here is to pour money into LLM research too. Massive state funding -> provide SOTA models for free -> dominate the market -> reap the rewards (from the free models).


It's not like the US doesn't face similar accusations. One such case is the WTO accusing Boeing of receiving illegal subsidies from the US government. https://www.transportenvironment.org/articles/wto-says-us-ga...


We don't do that.


Reading the next sentence clears the confusion:

> SPy is not a "compiler for Python". There are features of the Python language which will never be supported by SPy by design. Don't expect to compile Django or FastAPI with SPy.


Yeah but then don't say that SPy is a (interpreter and) compiler in the first place? Just say it's a interpreter.


It is a compiler. It is not a compiler for Python, because there are valid Python programs it can't compile and isn't intended to compile.


You can think of it like this:

SPy is a compiler. SPy is not a compiler for OCaml. SPy is not a compiler for COBOL. SPy is not a compiler for Python.


Are LLMs still trained by (variants of) stochastic GRADIENT descent? AFAIK what used to be called "backprop" is nowadays known as "automatic differentiation". It's widely used in PyTorch, JAX etc


Gradient descent doesn't matter here. Second order and higher methods still use lower order derivatives.

Back propagation is reverse mode auto differentiation. They are the same thing.

And for those who don't understand what back propagation is, it is just an efficient method to calculate the gradient for all parameters.


Wow, I'm NOT upgrading to iOS 26, this looks terrible. Moreover, I see people complaining about poor performance and worsening battery life.


Certainly neither is true for everyone. I don't notice any difference in my iPhone's life or performance.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: