Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

to me the "search engine" case where it reproduces a specific training image seems like a failure mode that's distinct from normal operation

> it is assembling those from averages of the components found it's training set, not from some abstract imagination or understanding

how exactly are you so certain that the human brain handles abstract concepts any differently? please note that I'm not claiming that I myself know, but rather that you almost certainly do not know and thus are presenting an invalid argument

what is human imagination anyway?

> assembling those from averages of the components found it's training set

> slices of many items, more of a mash-up of the training items

> But give it something very specific ... it doesn't have enough input data variety to abstract out the person 'object' from the background

so is it abstracting or not? where's the line between that and a mere statistical mashup?



>>how exactly are you so certain that the human brain handles abstract concepts any differently?

Good question. At the very least, we have a far deeper understanding of physical reality. Humans would not unintentionally (e.g., for effect) produce images of people with three ears, or of a bikini-clad girl seated on a boat with her head and torso facing us, and also her butt somehow facing us and thighs/knees away... yet I've seen both of these in the last week (sorry, couldn't find the reference, it was a hilarious image, looked great for 2sec until you saw it)

I admit that it is possible (tho I think unlikely) that this is a difference in quantity, not in kind.

One reason to doubt this is that Stable Diffusion was trained on 2.3 billion images. This is a vastly larger library than any human has seen in their lifetime (considering that viewing 2.3 billion images at one per second would take 72.8 years). Yet even if you count every second of eyesight as 'training', children under 1/10 of that age, who have seen only 10% of those images would not make the same kinds of mistakes.

Plus, the neuron/synapse/neurotransmitter and brainstem/midbrain/cerebellum micro & macro-architectures are vastly different than the computer training models. So, I think we can be confident that something different is happening.

>>so is it abstracting or not? where's the line between that and a mere statistical mashup?

Good question. There is definitely something we might call, or that might resemble abstraction. It's definitely able to associate the cutout images of an astronaut in a spacesuit from the backgrounds. It can evidently assemble those from different angles.

But it certainly does not have the abstraction to understand even the correct relationship between the parts of a human. E.g., it seems to keep astronauts' parts in the right relationship, but not bikini-clad-girls' parts (because of the variety of positions in the dataset?). There's no understanding of kinesiology, anatomy, or anything else that an actual artist would have.

Could this be trained in? I expect so, but I think it would require multiple engines, not merely six orders of magnitude more training of the same type. Even if 10^6X more training eliminated these error types and even performed better than humans, I'm not sure it would be the same, just different and useful.

I'd want to see evidence that it was not merely cut-pasting components of images in useful ways, but generating it from an understanding of the sub-sub components: "the thigh bone connects to the hip bone, the hip can rotate this far but not that far, the center of mass is supported...+++" as an artist builds up their images. Good artists study anatomy. These "AI"s haven't a clue that it exists.

>>to me the "search engine" case where it reproduces a specific training image seems like a failure mode that's distinct from normal operation

Au contraire, it seems that this merely exposes the normal operation. Insufficient images of that person prevented it from abstracting the person components from the background, so it just returned the whole thing. IDK whether it would take a dozen, hundred, or thousand more images of the same person, to work properly. But, if they all had some object in the background (e.g., a lamp) that was the same, the "AI" would include it in their abstraction.

(but I could be wrong).


> it seems that this merely exposes the normal operation. Insufficient images of that person prevented it from abstracting the person components from the background

yes my point was that this total failure to abstract (or slice or average or whatever it is that it usually seems to do) appears to me to be neither the intended nor typical mode of operation

> children under 1/10 of that age, who have seen only 10% of those images would not make the same kinds of mistakes

but then children aren't being fed a stream of unrelated images. they're receiving a wide array of real time sensory input from an environment they're actively operating in

consider your examples of the lack of higher level understanding about how the parts of a human "fit together". what practical experience do these models have that could actually convey such an understanding? deriving a proper understanding of mechanics in 3D from one million independent 2D still frames of human hands performing various tasks seems like it should be extremely difficult at best

> Could this be trained in? I expect so, but I think it would require multiple engines

I think it requires a different sort of training algorithm entirely. work such as https://arxiv.org/abs/1803.10122 suggests to me that there might be little difference between the human ability to abstract and lossy compression. at the same time work such as https://arxiv.org/abs/2205.11502 makes it apparent that in many cases this sort of generalization simply does not happen the way we'd like

> the neuron/synapse/neurotransmitter and brainstem/midbrain/cerebellum micro & macro-architectures are vastly different than the computer training models. So, I think we can be confident that something different is happening

something being architected differently doesn't necessarily mean that the higher level functionality is any different

moreover, in purely functional terms how do you propose to distinguish something that's different from something that's incomplete? ie a smaller piece of a larger whole? if someone constructs for example a passable digital model of the visual cortex of the mouse or human or other animal that's still only a single small piece of the whole

so who is and how are we to say that we either have or haven't achieved a meaningful form of abstraction versus merely averaging bits of the training set together? at this point I'm not actually clear where the line between those two things even lies


>>neither the intended nor typical mode of operation

Yup, certainly not intended, although I see it as the typical response on the edges of the data set; objects with too few varied representations will always fail in this way. Seems square-cubish as there will always be a volume of solid training data and a surface of partial data, so maybe not severe.

>> deriving a proper understanding of mechanics in 3D from one million independent 2D ...extremely difficult at best

Yup. This is definitely part of how it is different. Doing the full training set with stereographs would likely improve it, but it'd improve it even more to have the same images manipulated by robots and the feedback integrated. Considering the 3.5 billion parameters of DALL-E, 4.6B for Imagen and 890MM for Stable Diffusion, how many params would be needed to integrate stereo-vision and robotic feedback? 3.5billion squared or cubed? Would that be enough just scaled up, or do we need to qualitatively change the structure?

>>I think it requires a different sort of training algorithm entirely.

Agree 100%. I think these engines are a part of the solution, but not the whole. I expect we'll need multiple different kinds of training models, and then the methods to integrate them and correlate their 'knowledge'. E.g., figuring out how one part of a moderately complex object (e.g. a human) hides another part in certain positions (e.g., hand behind back) is trivial for a 3D modelling system, but even the massive 2d ones often get it wrong.

>>being architected differently doesn't necessarily mean that the higher level functionality is any different Definitely true. Parallel evolution, elec vs ICE powered cars, etc. The question is when we've achieved the same level of functionality.

>>how do you propose to distinguish something that's different from something that's incomplete?...achieved a meaningful form of abstraction versus merely averaging bits of the training set together? at this point I'm not actually clear where the line between those two things even lies

YES, excellent question. Especially since these models don't do much explaining of their inner workings. Humans also haven't fully figured out our inner workings either.

It's looking right now like different AI will arrive faster than biomimicry-based AI, partly because we still don't know the bio at a deep enough level. IDK if it'll stay this way.

I remember discussions a long time ago with a scientist who worked on AI for early Mars missions, and how they'd move their machines. He was describing the algos for tracking the world, their machine, and adjusting motion, with the team assuming that they were re-creating the way humans do it. From my experience as an international level athlete and a neuroscience minor in college (inspired by my sport experiences), I could tell that his methods were nothing like how biological systems work. Seeing Google's self-driving car drive around a racetrack was truly impressive, but from my sportscar-racing training 7 experience, I could instantly tell that it was accomplishing the task nothing like any human would, although it was achieving competent levels of performance (in a limited setting).

How do draw the line? It may come down to the kinds of clever tests built by childhood and animal behaviorists to study animals who can't self-report on their state or if they actually figure out something or not.

That said, I don't think it's impossible for an AI to end up exceeding our capabilities by using different methods. Kind of like Paul Bunyan vs the chainsaw.

(BTW, thanks for the lively discussion; it's a pleasure to be pushed to define my thoughts better, and I've learned; happy to keep it going)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: