This might be a dumb question to ask, but what exactly is this useful for? B-Rol...

jonas21 · on Dec 16, 2024

If you want to train a model to have a general understanding of the physical world, one way is to show it videos and ask it to predict what comes next, and then evaluate it on how close it was to what actually came next.

To really do well on this task, the model basically has to understand physics, and human anatomy, and all sorts of cultural things. So you're forcing the model to learn all these things about the world, but it's relatively easy to train because you can just collect a lot of videos and show the model parts of them -- you know what the next frame is, but the model doesn't.

Along the way, this also creates a video generation model - but you can think of this as more of a nice side effect rather than the ultimate goal.

manquer · on Dec 16, 2024

It doesn’t have to understand anything, none of these demonstrate reasoning or understanding.

All these models have just “seen” enough videos of all those things to build a probability distribution to predict the next step.

This is not bad, or make it inherently dumb, a major component of human intelligence is built on similar strategies. I couldn’t tell what grammatical rules are broken in text or what physical rules in a photograph but can tell it is wrong using the same methods .

Inference can take it far with large enough data sets, but sooner or later without reasoning you will hit a ceiling .

This is true for humans as well, plenty of people go far in life with just memorization and replication do a lot of jobs fairly competently, but not in everything.

Reasoning is essential for higher order functions and transformers is not the path for that

Nathanba · on Dec 16, 2024

That's like saying that your brain doesn't understand anything, it just analyzes the visual data coming in via your eyes and predicts the next step of reality

manquer · on Dec 17, 2024

The brain also does that . It doesn’t do it exclusively, but we do it an awful lot .

we do extensive amount of pattern matching and drop enormous amount of sensory input very quickly because we expect patterns and assume a lot about our surroundings.

Unlearning this is a hard skill to pick up. There are many versions of training from martial arts to meditation that attempt to achieve this .

Point is that alone is not sufficient, the other core component is reasoning and understanding , transformers and learning on data is insufficient .

Parrot and few other animals can imitate human speech very well , that doesn’t mean they are understanding the speech or constructing .

Don’t get me wrong, i am not saying it is not useful, it is , but this attribution of reasoning and understanding to models that foundationally has no such building block is just being impressed by a speaking parrot

Nathanba · on Dec 17, 2024

I think people are just fundamentally not willing to attribute intelligence to things that can't have conversations. This is why the incredible belief was possible that babies or dogs don't feel pain. Once the AI is given some long term memory all of these ideas that AI is just a parrot will suddenly be gone and I personally think that it will probably be pretty easy to give robots memories and their own personal motivations. All you have to achieve is to train them in realtime and the rest is an optimization, you want the training to make sense and have it not store/believe every single thing that it is being told etc.

manquer · on Dec 17, 2024

It is also the corollary: we tend to attribute intelligence to things merely because it can have conversations from the first golden era of AI in 1960's that is always the case.

Mimicking more patterns like emotion and motivation may be better user experience, it doesn't make the machine any smarter, just a better mime.

Your thesis is that as we mimic reality more and more the differences will not matter, this is a idea romanticized by popular media like Blade Runner.

I believe there are classes of applications, particularly if the goal singularity or better than human super intelligence, emulating human responses no matter how sophisticated won't take you take there. Proponents may hand wash this as moving the goalposts, it is only refining the tests to reflect the models of the era.

If the proponents of AI were serious about their claims of intelligence than they should also be pushing for AI rights , there is no such serious discourse happening, only issues related to human data privacy rights on what can be used by AI models for learning or where they can the models be allowed to work.

dwohnitmok · on Dec 17, 2024

> If the proponents of AI were serious about their claims of intelligence than they should also be pushing for AI rights , there is no such serious discourse happening

It's beginning to happen. Anthropic hired their first AI welfare researcher from Eleos AI, which is an organization specifically dedicated to investigating this question: https://eleosai.org/

terhechte · on Dec 16, 2024

Back when computers took up a whole room, you'd also have asked: "but what exactly is this useful for? B-Roll some simple calculations that anybody can do with a piece of paper and a pen."?

Think 5-10 years into the future, this is a stepping stone

alectroem · on Dec 16, 2024

That's comparing apples to oranges though isn't it? Generating videos is the output of the technology, not the tech itself. It would be like someone asking "this computer that takes up a whole room printed out ascii art, what is this useful for?"

shombaboor · on Dec 17, 2024

all the "creative" gen ai does a thing worse and more annoying than what exists now. the first computers did calculations faster and faster with immediate utility (for defense mostly)

code_for_monkey · on Dec 16, 2024

this is kind of an unfair comparison. Whats the endpoint of generating AI videos? What can this do that is useful, contributes something to society, has artistic value, etc etc. We can make educational videos with a script but its also pretty easy for motivated parties to do that already, and its getting easier as cameras get better and smaller. I think asking "whats the point of this" is at least fair.

Eisenstein · on Dec 17, 2024

The end point is enabling people to put into video what is in their mind. Like a word processor for video. When you remove the need to have a room full of VFX artists to make a movie, then anyone can make a movie. Whether this is beneficial is dubious, but that's an end goal if you are looking for one.

mindwok · on Dec 16, 2024

They’re a way firo

carlosjobim · on Dec 16, 2024

They were calculating missile trajectories, everybody understood what they were useful for.

terhechte · on Dec 16, 2024

https://www.lexology.com/library/detail.aspx?g=164a442a-1b90...

drusepth · on Dec 16, 2024

We're preparing to use video generation (specifically image+text => video so we can also include an initial screenshot of the current game state for style control) for generating in-game cutscenes at our video game studio. Specifically, we're generating them at play-time in a sandbox-like game where the game plays differently each time, and therefore we don't want to prerecord any cutscenes.

moritonal · on Dec 16, 2024

Okay, so is the aim to run this locally on a client's computer or served from a cloud? How does the math work out where it's not just easier at that point to render it in game?

notatoad · on Dec 16, 2024

in it's current state, it's already useful for b-roll, video backgrounds for websites, and any other sort of "generic" application where the point of the shot is just to establish mood and fill time.

but more than anything it's useful as a stepping stone to more full-featured video generation that can maintain characters and story across multiple scenes. it seems clear that at some point tools like this will be able to generate full videos, not just shots.

wnolens · on Dec 16, 2024

TV commercials / youtube ads. You don't need a video team anymore to make an ad.

nope96 · on Dec 17, 2024

This is a first step towards "the holodeck". You describe a scene and it exists. Imagine you could jump in and interact with it. That seems like something that could happen in 10-20 years.

mbil · on Dec 17, 2024

You and your friends gather around the TV to watch a video about the time that you all traveled abroad and met a mysterious stranger. In the film, you witness each other take incredible risks, have intimate private conversations, and change in profound ways. Of course none of it actually happened; your voices and likenesses were fed into the movie generator. And did I mention in the film you’re driving expensive cars and wearing designer clothes?

Philpax · on Dec 16, 2024

Are they that limited? It's a machine that can make videos from user input: it can ostensibly be used wherever you need video, including for creative, technical and professional applications.

Now, it may not be the best fit for those yet due to its limitations, but you've gotta walk before you can run: compare Stable Diffusion 1.x to FLUX.1 with ControlNet to see where quality and controllability could head in the future.

picafrost · on Dec 16, 2024

I have observed some musicians creating their own music videos with tools like this.

aenvoker · on Dec 16, 2024

This silly music video was put together by one person in about 10 hours.

https://www.reddit.com/r/aivideo/comments/1hbnyi2/comment/m1...

Another more serious music video also made entirely by one person. https://www.youtube.com/watch?v=pdqcnRGzH5c Don't know how long it took though.

hnuser123456 · on Dec 16, 2024

Because it's pretty cool to be able to imagine any kind of scene in your head, put it into words, then see it be made into a video file that you can actually see and share and refine.

carlosjobim · on Dec 16, 2024

Use your imagination.

yieldcrv · on Dec 16, 2024

this is perfect for the landing page of any website I make

my templates all are waiting for stock videos to be added looping in the background

you have no idea how cool I am with the lack of copyright protections afforded to these videos I will generate, I'm making my money other ways

krunck · on Dec 16, 2024

Streaming services where there is no end to new content that matches your viewing patterns.

code_for_monkey · on Dec 16, 2024

this sounds awful haha

chefandy · on Dec 17, 2024

It's got a lot of potential as a way for google to get paid for other people's skills and hard work instead of the people that made all of that "data".

ElemenoPicuares · on Dec 17, 2024

It’s kind of hilarious that anybody considers this “democratizing” creating media. How many people that need a video clip are going to be capable of running an open version of this themselves? The wonky “open” models aren’t even close. How much do you think these services are going to cost once the introductory period financed by race-to-the-bottom money stops? OpenAI already charges $200/mo if you want to be guaranteed more than 30-60 minutes of Advanced Voice. The introductory period exists solely to get people engaged enough to push through blatantly stealing millions of artists creative output so they can have a beautiful tool they sell to Hollywood for a whole lot of money that’s still less than traditional vfx, and to m everyone gets to dink around in the useless free models or too-expensive-for-most prosumer tools and people with expensive video card arrays or the functional equivalent will still be niche tinkering hobbyists with inferior tooling and models and the skilled commercial artists still employed are being paid shit because of market forces. Great job SV. Making the world a better place.

tucnak · on Dec 16, 2024

You really think making videos with computers is not useful? Is this a joke?