OK but... what's happening here is SD is capturing textures and some of their relationships. It clearly has no understanding of the objects it's generating.
So the output is a kind of Dali-esque mushed up melted version of the original content.
It's entertaining because it's simultaneously referential and heavily distorted in unexpected ways.
You could use this a manual art technique, and it would be interesting-ish.
I'm curious if it's capable of getting to the next stage, of understanding and distorting the actual design relationships in these objects to the point where it could deepfake catalog pages and they'd be indistinguishable from the real thing.
I suspect there's quite a gap between that stage and this one.
Feels like you'd make more progress with discrete models working together. At the very simple level, Stable Diffusion is pretty terrible at words and fonts but a reasonable model of where [patterns of pixels that look like] text might fit, and how big should it be. Add a second step which recognises Stable-Diffusion generated pseudotext blocks and replaces them with GPT-generated text to the same prompt set in an actual font scaled to match StableDiffusion's attempted font and you'll more likely get something that passes the "zoom in and try to read it" test. Though there may not be very much correspondence between the images and text.
A more complex arrangement adds in a model which chooses which does high level structure to fit a prompt (subprompts on suitable topics for images and text for each page), a 'house style' model to pick fonts and copy/paste stuff like the RadioShack logo direct from its source material, plain old StableDiffusion to draw lots of individual pictures "cassette player deluxe RadioShack 1970" and plain old GPT to write the text which is typeset to the 'house style' model's specification,
and probably an "observer" model that forces new iterations of really bad pages.
Great thing about this magazine creation process is it also tends to work better for humans than having one person do everything!
The analogy in my mind is the heyday of x86 ubiquity vs the coming revolution of many SoCs in one device.
Huge, all-purpose, massive-capacity NNs are the norm for the current advances in SOTA. This is probably because of a certain limit on development complexity: designing a complex system of richly interacting parts is hard, so as long as development of these sorts of tools is manual, it will be much easier just to shove more compute at the problem with singular, massive models. Compare this to the all-encompassing architecture hegemony of x86. This makes it relatively easy to write software that people will be able to use on devices they already own, and this reduction in developer effort and complexity is enormous for enabling rapid growth in the number of possible software programs that can be created per unit time.
As ML design becomes more automated and functional, new possibilities open up regarding breaking out sub-tasks to individual, hyper-specialized tools and combining them into a resilient and capable whole that is more than the sum of its parts. That is the true power of automatic differentiation frameworks: when your system can be end-to-end differentiable throughout many devices and specialized functions, and you can train parts as easily as you can train the whole, you begin to observe the creation -- the growth, really -- of a new kind of digital cognition machine.
So the first "prompt" would be "write a series of prompts for an AI to use as steps to make variations of this image, and connect it together afterwards"
Thing is, SD in particular is relatively small (860M parameters), and quantity can be somewhat transformed into quality in this case. As Google Parti landing page [0] conveniently demonstrates, more parameters with the same architecture yield more coherent output, including text and symbols. Given enough room (enough weights) in all parts of the model, starting from CLIP, it could even construct coherent text, as the compression ratio would be much less. However you'd need much beefier hardware to run it, and it might not be as efficient as having a better architecture or a dataset/training skewed towards symbols.
"Looks like" and "resembles" are fairly orthogonal to "understands." Of one is looking for understanding, I would say Stable Diffusion is traveling in the wrong direction.
Its so interesting seeing how these txt2img models represent text. Its sort of like how someone who doesn't know how to read might represent language, as shapes instead of characters and words.
That said, it would be a fun experiment to try an img2txt model on these individual catalog items to find out what they actually do (or image search)
If the computer was a bicycle for the mind then Stable Diffusion is LSD for the computer.
Who cares how theoretically soulless it is under the matrix transforms, the fact is that this spits out the weird, and us humans chew it up and spit it right back.
I think this is the first new era in Art since postmodernism.
How long until what is currently photography is either full length video or 3d navigable worlds?
These Star Wars galleries are just phenomenal. I can't imagine it is long until you can apply these as filters to movies, for example taking Rogue One and applying a Fritz Lang or Stanley Kubrick filter.
Or akin to Cars being a remake of Doc Hollywood with anthropomorphized Cars, being able to say something like "I want to see a remake of the 2013 movie Rush, in the world of Zootopia, with 15% styling from Speed Racker, set in the 70's, with flying animals, and the main two rival protagonists being flying squirrels, one of which has a birth defect and a prosthetic wing."
The think that I find the most LSD-like, both in the visuals it generate and in principle is Google DeepDream.
It works by using an image classifier in reverse. So for example, you have a neural network that identifies bicycles, feed it an image, get the results, and feed them back to the image, boosting the bicycle-like characteristics of the image, repeat the process a number of times. In the end, you get something that looks like the original image, but made of bicycle parts. It is commonly done with faces, it can also be done on intermediate layers, amplifying more abstract details like geometric shapes.
Originally intended as a way to reveal the inner workings of a neural network (for research, debugging, etc...), it has also been used by artists for really trippy visuals.
This is what I'm talking about, amazing work! Generative AI can unleash infinite un-realities -- shared reality it at drastic risk of being lost even more.
Baudrillard's visison of the hyperreal is becoming overwhelmingly true and at an exponential rate. I envision that soon we won't even need historical documents because we can just auto-generate documents based on historical data. Which will then become the new historical documents, in a process that keeps folding in on itself until the bubble our immediate reality shrinks and shrinks until there is only the smallest part of the inner ear vaguely recalling that gravity is acting upon us, while the rest of our senses and thoughts are wrapped in the warm blanket of an auto-generated fugue.
When I think about procedurally generated entertainment, that can be materialized on demand, I find my self thinking about the movie Strange Days.
Will the future be one of societal fragmentation where 1) nobody ever rewatches anything, they just generate more new stuff 2) nobody watches each others work ("hey watch my feed I just generated, it was awesome" followed by "yeah sure, someday") and just watches more new custom stuff generated for them which just becomes 3) everybody is watching an endless feed of new procedurally generated levels like endless runner games and 4) there is no longer any shared experience between people that they can realistically use as a foundation to communicate or interact.
OR will the future be MOSTLY the previous but with a counterculture market for vintage (and modern) "authentic experiences" some of which will be black market. And then as part of that counterculture demand, how much of the "authentic experience" content be counterfeit procedurally generated to look real. And then the act of consuming counterfeit "authentic experiences" en mass just becomes a role play archeology treasure hunting game.
But at the moment, people watch new TV shows when they come out, rather than watching old shows which are just as good. I think it's because they enjoy watching the same thing as other people.
Would you if there were endless new episodes? Or upon rewatch that they could change slightly?
What about your kids kids? Would they look back on older generations who are watching non procedurally generated content that never changes as weirdos?
For story arcs, the thrill is an evolving story with a coherent plot.
I suppose singular X-Files and Seinfeld episodes could easily fit in with the procedurally generated category. That is very exciting prospect. Seinfeld infinity.
There are many mechanisms to collectively decide things, so you aren't even close about that (Democracy, the market, representative government at every level, the United Nations, proxy votes for corporations, school boards, boards of governors, juries, HOAs, ballot initiatives and referendums, elections generally, zoning commissions, family meetings, 4 friends debating where to get dinner, really far too many to list here, if anything almost all decisions of any importance are made via some mechanism to collectively decide things).
There isn't a way to collectively decide things with absolute authority, but that is why things don't suck worse in general. If we make collective decisions we could force people to do some things which are more optimal to our goals. However, that assumes that we universally agree on what the goals are (not even close) and that the decisions won't actually be worse for the chosen goals (sometimes they will be far worse) and that the decision making process will never be irreversibly hijacked by some group for their own benefit (it absolutely will be). So you are not just wrong about this, your premise is incorrect and your conclusion does not follow from that premise even if it were.
Where did I say we would or could decide against it collectively? Individual people can decide for themselves not to engage with harmful technologies in the future, just as many do today with cell phones, computers, television, etc. Not everything has to be done by governments.
But will we? Those of us who have been alive long enough to know what life was like before any of this may choose not to. But our youngest generation, and those to come, may grow up not knowing any different.
Not suggesting it will happen, but it is an unappealing thought.
If anything, I think technology like this has gradually empowered people to rise above the autogenerated fugue that has historically always been a part of everyday life (though historically the human mind did a pretty good job generating that on its own thanks to widespread ignorance, superstition, and fear). Speaking personally, I find AI chatbots and image generators when I'm using them myself to be like a refreshing drink of cool fresh water compared with the sensation of being only drip fed or waterboarded by businesses wielding the technology to influence my behavior without my explicit input or full consent.
as individuals we have enough trouble asserting will over impulse as it is--hard to imagine us collectively deciding on anything like this when institutions are even more vulnerable to reactivity
I miss getting these catalogs! Someone should send out a weekly newsletter of random interesting gadgets linked to online stores where you can buy them, with a layout that perfectly mimics these old catalogs. I'd eat that up.
I have been following this account and enjoying all their posts of amazing product designs - and I didn't even realise until I read this comment that it's AI -generated!
General mastodon question here, I click login but I can login with my credentials because I’m on mastodon.social. How do I login from this interface to comment in this thread or is this not possible?
You could write an algorithm that does image recognition and cut and paste from a plethora of image resources and build a similar catalog of actual products from a given time period.
This just looks like the typical machine learning throw up of dumping a bunch of statistically averaged and collaged images that's then been blurred as if someone ran their finger across the image and random text. It isn't clear to me where the intrigue is aside from a "hmph, I guess (?) that's cool".
I think it's just a common sort of ex nihilo first steps example of the technology that's easy to show off to anyone, and it's a good example of how the results can be iterated and cherry-picked to filter out the most garbage-laden images to get stuff that's basically (what another commentor called) "visual lorem ipsum".
There's a lot more that stable diffusion can do when there is a feedback loop between the user and the computer, but I don't think it's very easy to convey with pop articles or even long form ones - I hope one day everyone has a chance to approach models like this and learn from them in their own way, and I appreciate articles like this in their attempt to get a wider audience interested in the technology.
That’s a well balanced perspective. And I don’t mean to be a Luddite or anything, but I just don’t think I see what this will be useful for outside of the typical advertising and abusive use cases.
One use case I can currently imagine, outside of advertising, is maybe storyboarding, because you don’t really care much about fidelity or even style there, and it’s primarily a scaffolding tool. However, I’m not terribly sure of the feedback loop being anywhere near that of a director or cinematographer or writer sitting down with a storyboard artist. But maybe you don’t have access to a storyboard artist.
There is a valid position of asking “why do we need this?”, and I don’t think it gets asked enough in technology. One thing I am sure of is that this type of machine learning art will be abused.
There is an unfortunate inevitability though with humans and technology.
I don't understand the motivation of the person who sees 60 hours of video uploaded to YouTube per second, 30+ new video games released on Steam per day, 100,000 songs uploaded to Spotify every day and 6,000 Tweets being made per second, and decides "You know what the World needs? A way of enabling more people to make more content faster."
> You could write an algorithm that does image recognition and cut and paste from a plethora of image resources and build a similar catalog of actual products from a given time period.
And if you did that I would think it was super interesting and cool too.
Because by typing three words and clicking two buttons you get to claim you're an artist and your ticket for the future at the same time
It's one more step in the general dumbing down of everything tech touches. You don't need skills, you don't need to devote time it, you don't even need to understand how it works, just go on a website, give them your money, write something and boom you're done
I think being stuck in a content consumption cycle for a while made people slowly realise that creation is much more fulfilling than consumption, these things give them the illusion of creating things
I use stable diffusion almost daily and this is a really creative use! I have leaned into the weird text that comes with these types of prompts and I love it.
The J. Peterman catalog, which was a storyline on Seinfeld, is a real catalog and apparently as pretentious as the show’s mockery made it out to be. It would be good grist for the SD/GPT mill.
I worked at a Radio Shack in 1986-87, sort of a dream job for 18 years old. Now I know what it would have looked like if I showed up for work one day on LSD!
It’s funny though, one consequence of the federation is that the domain name on the link gives you no clue you’re going to see Mastodon when you click it. (Unlike centralized services like Twitter or Facebook.)
If you wanted to see all the Mastodon links posted to HN, you’d have to either start with a list of all known Mastodon server domains and search those, or scrape all the links and pick out the ones that land on a Mastodon instance.
I'm seeing increasing usage on discord as well, in the usual exchange-of-memes process. Things that would have been tweet links are now mastodon links.
This is probably significantly increasing the discoverability of mastodon pods. I previously had no idea what pods to join, now I can see what pods the people in my discords are active in or consuming content from.
So the output is a kind of Dali-esque mushed up melted version of the original content.
It's entertaining because it's simultaneously referential and heavily distorted in unexpected ways.
You could use this a manual art technique, and it would be interesting-ish.
I'm curious if it's capable of getting to the next stage, of understanding and distorting the actual design relationships in these objects to the point where it could deepfake catalog pages and they'd be indistinguishable from the real thing.
I suspect there's quite a gap between that stage and this one.