We've had voice input and voice output with computers for a long time, but it's never felt like spoken conversation. At best it's a series of separate voice notes. It feels more like texting than talking.
These demos show people talking to artificial intelligence. This is new. Humans are more partial to talking than writing. When people talk to each other (in person or over low-latency audio) there's a rich metadata channel of tone and timing, subtext, inexplicit knowledge. These videos seem to show the AI using this kind of metadata, in both input and output, and the conversation even flows reasonably well at times. I think this changes things a lot.
The "magic" moment really hit in this, like you're saying. Watching it happen and being like "this is a new thing". Not only does it respond in basically realtime, it concocts a _whole response_ back to you as well. It's like asking someone what they think about chairs, and then that person being able to then respond to you with a verbatim book on the encyclopedia of chairs. Insane.
I'm also incredibly excited about the possibility of this as an always available coding rubber duck. The multimodal demos they showed really drove this home, how collaboration with the model can basically be as seamless as screensharing with someone else. Incredible.
Still patiently waiting for the true magic moment where I don't have to chat with the computer, I just tell it what to do and it does it without even an 'OK'.
I don't want to chat with computers to do basic things. I only want to chat with computers when the goal is to iterate on something. If the computer is too dumb to understand the request and needs to initiate iteration, I want no part.
(See also 'The Expanse' for how sci-fi imagined this properly.)
For me, this is seriously impressive, and I already use LLMs everyday - but a serious "Now we're talkin" moment would be when I'd be able to stand outside of Lowes, and talk to my glasses/earbuds "Hey, I'm in front of lowes, where do I get my air filters from?"
and it tells me if it's in stock, aisle and bay number. (If you can't tell, I am tired from fiddling with apps lol)
I would guess that most companies will not want to provide APIs that an agent could use to make that kind of query. So, the agent is going to have to use the app just like you would, which looks like it will definitely become possible, but again, Lowes wants the human to see the ads. So they're going to try to break the automation.
It's going to take customers demanding (w/$) this kind of functionality and it will probably still take a long time as the companies will probably do whatever they can to maintain (or extend) control.
At some level, isn’t “connecting you effortlessly with the product you explicitly told me you were here to find” the best kind of ad? To the extent that Lowe’s hires armies of friendly floor staff specifically to answer that kind of question face to face, help my dumb self figure out what the right filter size and type is, learn the kind of particulars about my house that the LLM will just know, and build my confidence that my intentions are correct in my case?
Google has always made it hard to avoid clicking the “ad” immediately above the organic result for a highly specific named entity, but where it’s really struck me is as Amazon has started extracting “sponsorship” payments from its merchants. The “sponsored” product matching my search is immediately above the unpaid organic result, identical in appearance.
That kind of convergence suggests to me that the Lowe’s of the world don’t need to “show the ad” in the conventional sense, they just need to reduce the friction of the sale—and they stand to gain more from my trust and loyalty over time than from a one-off upsell.
I’m reminded of Autozone figuring out, on their dusty old text consoles, how to just ask me my make/model/year, and how much value and patronage that added relative to my local mom-n-pop parts store since I just knew all the parts were going to be right.
That's kinda what I meant with customers demanding it with their money. But, avoiding upselling is not really what I see stores doing. I don't want the cashier (or payment terminal) to push me to open new credit accounts or buy warranties. I don't want them to arrange their stores so I have to walk past more ads and products that I'm not interested in today. They still do it, and they work hard at doing it.
I’m on Lowes website right now. Can you point out an ad? Because I don’t see any. And why do you think that companies can’t inject advertising into their LLMs? It’s easy to do and with a long enough customer relationship, it gets very powerful. It’s like a sales clerk who remembers everything you have ever bought and appears like it understands your timing.
As for data, I can name several major retailers who expose the stock/aisle number via a public api. That information is highly available and involved in big dollar tasks like inventory management.
When I go to the Lowe's website, the homepage itself is covered in ads. "Spring Into Deals", "Lowe's and Messi are assisting you with 100 points! Join Our Loyalty Program". "Get up to 35% off select major appliances"... the more I scroll, the more ads come up.
Companies can inject ads into their own LLMs, sure. But ChatGPT is somebody else's LLM.
Your point about retailers exposing stock/aisle number via a public API surprises me. What do you mean by public? What's the EULA look like? Exposing stock/aisle number via API for the purpose of inventory management is not a use case that would require making API access public.
If they want to sell more products to more people they will need to provide those APIs. If an AI assistant can make home maintenance more accessible then that will translate to more people shopping at Lowes more often but only if their inventory and its location are accessible by the assistant helping the customer decide which store to go to for the right part. If your store blocks the assistant then it’s going to suggest the competitor who provides access. It would be even better if the assistant can place an order for curbside pickup.
Or we could overcome this with a distributed system where the devices of individuals who have been to the store recently record data about the current location of products and upload it somewhere for the rest of the users to query if needed.
More likely future LLMs will mix ads into their responses. ("Your air filters are in stock in aisle A6. Have you heard about the new membership plan from Lowes...?")
If it was a real Personal Assistant I would just have to say: "I want to pick up my home air filter at Lowes today." and it would 1. know what brand/model air filter I needed, 2. know which Lowes is my local one, 3. place the order for me, and 4. let me know when it will be available to pick up.
Do they want a better ad than what the GP was describing? There isn't one they can buy.
(But yeah, I guess they will want it, and break any reasonable utility from their stores on the process. That's what everybody does today, I'm not holding my breath for management to grow some competence out of nowhere in the future.)
I want it to instruct me exactly how to achieve things. While agents doing stuff for me is nice, my agency is more important and investing into myself is best. Step by step, how to make bank -- what to say, what to do.
Automation tech frees up time but takes away agency and opportunity in exchange.
Empowerment tech creates opportunity and increases agency, but it needs you to have time and resources, and these costs can easily increase existing gaps between social classes.
This was exemplified to me by the recent Tesla Full Self Driving trial that was installed on my car. When using it, my want of agency was constant -- it was excruciating to co-pilot the car with my hands on the wheel necessarily ready to take over at any moment. It was not "right enough of the time" for me.
I think the movie "Her" buried the lead. Why have a girlfriend in one's ear when one could have a compilation of great entrepreneurs multimodally telling you what to do?
Re: The Expanse. I must have missed that. Maybe that’s the point. People no longer think of a computer as some separate thing that needs to be interacted with.
The best example is the scene where Alex has to plot a course to the surface of Ganymede without being detected by the Martian and Earth navies. He goes over multiple iterations of possible courses with the computer adjusting for gravity assists and avoiding patrols etc... by voice pretty seamlessly.
Hmmm...maybe I should name my next company Vegetable or Chicken so that folks accidentally buy my stock. Sort of like naming your band "Blank Tape" back in the 90's.
> I don't want to chat with computers to do basic things. I only want to chat with computers when the goal is to iterate on something. If the computer is too dumb to understand the request and needs to initiate iteration, I want no part.
This is called an "employee" and all you need to do is pay them. If you don't want to do that, then I have to wonder: Is what you want slavery?
As goofy as I personally think this is, it's pretty cool that we're converging on something like C3P0 or Plankton's Computer with nothing more than the entire corpus of the world's information, a bunch of people labeling data, and a big pile of linear algebra.
There probably is, since I believe tensors were basically borrowed from Physics at some point. But it's probably not of much practical use today, unless you want to explore Penrose's ideas about microtubules or something similarly exotic.
Gains in AI and compute can probably be be brought back to physics and chemistry to do various computations, though, and not limited to only protein folding, which is the most famous use case now.
For what it's worth, the idea of a "tensor" in ML is pretty far removed from any physical concept. I don't know its mathematical origins (would be interesting I'm sure), but in ML they're only involved because that's our framework for dealing with multi-linear transformations.
Most NNs work by something akin to "(multi-)linear vector transformation, followed by elementwise nonlinear transformation", stacked over and over so that the output of one layer becomes the input of the next. This applies equally well to simple models like "fully-connected" / "feed-forward" networks (aka "multi-layer perceptron") and to more-sophisticated models like transformers (e.g. https://github.com/karpathy/nanoGPT/blob/325be85d9be8c81b436...).
It's less about combining lots of tiny local linear transformations piecewise, and more about layering linear and non-linear transformations on top of each other.
I don't really know how physics works beyond whatever Newtonian mechanics I learned in high school. But unless the underlying math is similar, then I'm hesitant to run too far with the analogy.
I realized that my other answer may have come off as rambling for someone not at all familiar with modern physics. Here's a summary:
Most modern physics, including Quantum Mechanics (QM) and General Relativity (GR) is represented primarily through "tensor fields" on a type of topological spaces called "manifolds". Tensor fields are like vector fields, just with tensors instead of vectors.
These tensor fields are then constrained by the laws of physics. At the core, these laws are really not so much "forces" as they're symmetries. The most obvious symmetries is that if you rotate or move all objects within a space, the physics should be unaltered. Now if you also insist that the speed of light should be identical in all frames of reference, you basically get Special Relativity (SR) from that.
The forces of electromagnetism, weak and strong force follow from invariance under the combined U(1) x SU(2) x SU(3) symmetries. (Gravity is not considered a real force in General Relativity (GR), but rather as interaction between spacetime and matter/energy, and what we observe as Gravity is similar to time dilation of SR, but with curved space)
Ok. This may be abstract if you're not familiar with it, and even more if you're not familiar with Group Theory. But it will be referenced further down.
"Manifolds" are a subset of topological spaces that are Euclidian or "flat" locally. This flatness is important, because it's basically (if I understand it correctly myself) the reason why we can use linear algebra for local effects.
I will not go into GR here, since that's what I know least well, but instead focus on QM which describes the other 3 forces.
In QM, there is the concept of the "Wave Function" which is distributed over space-time. This wave-function is really a tensor with components that give rise to observable fields, such as magnetism, the electric field and to the weak and strong forces. (The tensor is not the observed fields directly, but a combination of a generalization of the fields and also analogues to electric charge, etc.)
So how physics calculations tends to be done, is that one starts with assuming something like an initial state, and then impose the symmetries that correspond to the forces. For instance, two electrons wavefunctions may travel towards the same point from different directions.
The symmetries will then dictate what the wave function looks like at at each later incremental point in time. Computationally, such increments are calculated for each point in space using tensor multiplication.
While this is "local" in space, points in space immediately next to the point we're calculating for need to be include, kind of like for convolutional nets.
Basically, though, it's in essence a tensor multiply for each point in space to propagate the wave function from one point in time to the immediate next point.
Eventually, once the particles have (or have not) hit each other, the wave functions of each will scatter in all directions. The probability for it to go in any specific direction is proportional to the wave function amplitude in that direction, squared.
Since doing this tensor multiplication for every point in space requires infinite compute, a lot of tricks are used to reduce the computation. And this where a lot of our intuitions about "particles" show up. For simple examples, one can even do very good approximations using calculus. But fundamentally, tensor multiplication is the core of Quantum Mechanics.
This approach isn't unique to QM, though. A lot of other Physics is similar. For instance, solid state physics, lasers or a lot of classical mechanics can be described in similar frameworks, also using tensors and symmetry groups. (My intuition is that this still is related to Physics involving local effects on "locally flat" Manifolds)
And this translates all the way up to how one would do the kind of simulations of aspects of physical worlds that happen in computer games inside GPU's, including the graphics parts.
And here I believe you may see how the circle is starting to close. Simulations and predictions of physical systems at many different levels of scale and abstraction tend to reduce to tensor multiplication of various sorts. While the classical physics one learns in high school tend to have problems solvable with calculus, even those are usually just solutions to problems that are fundamentally linear algebra locally.
While game developers or ML researches initially didn't use the same kind of Group Theory machinery that Physics have adapted, at least the ML side seem to be going in that direction, based on texts such as:
(There appears to be a lot of similar findings over the last 5-6 years or so, that I wasn't fully aware of).
In the book above, the methodology used is basically identical to how theoretical physics approach similar problems, at least for networks that describe physical reality (which CNNs tends to be good for)
And here is my own (current) hypothesis why this also seems to be extendable to things like LMM, that do not at face value appear like physics problems:
If we assume that the human brain evolved the ability to navigate the physical world BEFORE it developed language (should be quite obvious), it should follow that the type of compute fabric in the brain should start out as optimized for the former. In practice, that means that at the core, the neural network architecture of the brain should be good at doing operations similar to tensor products (or approximations of such).
And if we assume that this is true, it shouldn't be surprising that when we started to develop languages, those languages would take on a form that were suitable to be processed in compute fabric similar to what was already there. To a lesser extent, this could even be partially used to explain why such networks can also produce symbolic math and even computer code.
Now what the brain does NOT seem to be evolved to do, is what traditional Turing Machine computers are best at, namely do a lot very precise procedural calculations. That part is very hard for humans to learn to do well.
So in other words, the fact that physical systems seem to involve tensor products (without requiring accuracy) may be the explanation to why Neural Networks seem to have a large overlap with the human brain in terms of strengths and weaknesses.
My understanding (as a data engineer with a MSc in experimental particle physics a long time a ago), is that the math representation is structurally relatively similar, with the exception that while ML tensors are discrete, QM tensors are multi-dimensional arrays locally but are defined as a field over continous space.
Tensors in Physics are also subject to various "gauge" symmetries. That means that physical outcomes should not change if you rotate them in various ways. The most obvious is that you should be able to rotate or translate the space representation without changing the physics. (This leads to things like energy/momentum conservation).
The fundamental forces are consequences of some more abstract (at the surface) symmetries (U(1) x SU(2) x SU(3)). These are just constrains on the tensors, though. Maybe these constraints can be in the same family as backprop, though I don't know how far that analogy goes.
In terms of representation, the spacetime part of Physics Tensors is also treated as continous. Meaning that when, after doing all the matrix multiplication, you come to some aggregation step of calculations, you aggregate by integrating instead of summing over spacetime (you sum over the discrete dimensions). Obviously though, for when doing the computation in a computer, even integration reduces to summing if you don't have an exact solution.
In other words, it seems to me that what I originally replied to, namely the marvel about how much of ML is just linear algebra / matrix multiplication IS relatively analogous to how brute force numerical calculations over quantum fields would be done. (Theoretical Physicists generally want analytic solutions, though, so generally look for integrals that are analytically solvable).
Both domains have steps that are not just matrix multiplication. Specifically, Physics tend to need a sum/integral when there is an interaction or the wave function collapses (which may be the same thing). Though even sums can be expressed as dot products, I suppose.
As mentioned, Physics will try to solve a lot of the steps in calculations analytically. Often this involves decomposing integrals that cannot be solved into a sum of integrals where the lowest order ones are solvable and also tend to carry most of the probability density. This is called perturbation theory and is what gives rise to Feynmann diagrams.
One might say that for instance a convolution layer is a similar mechanic. While fully connected nets of similar depth MIGHT theoretically be able to find patterns that convolutions couldn't, they would require an impossibly large amount of compute to do so, and also make regularization harder.
Anyway, this may be a bit hand-wavy from someone who is a novice at both quantum field theory and neural nets. I'm sure there are others out there that know both fields much better than me.
Btw, while writing this, I found the following link that seems to take the analogy between quantum field theory and CNN nets quite far (I haven't had time to read it)
I browsed the linked book/article above a bit, and it's a really close analogy to how physics is presented.
That includes how it uses Group Theory (especially Lie Algebra) to describe symmetries, and to use that to explain why convolutional networks work as well as they do for problems like vision.
The notation (down to what latin and greek letters are used) makes it obvious that this was taken directly from Quantum Mechanics.
Is this a trick question? OpenAI blatantly used copyrighted works for commercial purposes without paying the IP owners, it would only be fair to have them publish the resulting code/weights/whatever without expecting compensation. (I don't want to publish it myself, of course, just transform it and sell the result as a service!)
I know this won't happen, of course, I am moreso hoping for laws to be updated to avoid similar kerfuffles in the future, as well as massive fines to act as a deterrent, but I don't dare to hope too much.
I was envisioning a future where we've done away with the notion of data ownership. In such a world the idea that we would:
> have all of OpenAI's data for free
Doesn't really fit. Perhaps OpenAI might successfully prevent us from accessing it, but it wouldn't be "theirs" and we couldn't "have" it.
I'm not sure what kind of conversations we will be having instead, but I expect they'll be more productive than worrying about ownership of something you can't touch.
So in that world you envision someone could hack into openai, then publish the weights and code. The hacker could be prosecuted for breaking into their system, but everyone else could now use the weights and code legally.
I think that would depend on whether OpenAI was justified in retaining and restricting access to that data in the first place. If they weren't, then maybe they get fined and the hacker gets a part of that fine (to encourage whistleblowers). I'm not interested in a system where there are no laws about data, I just think that modeling them after property law is a mistake.
I haven't exactly drafted this alternative set of laws, but I expect it would look something like this:
If the data is derived from sources that were made available to the public with the consent of its referents (and subject to whatever other regulation), then walling it off would be illegal. On the other hand, other regulation regarding users' behavior world be illegal to share without the users consent and might even be illegal to retain without their consent.
If you want to profit from something derived from public data while keeping it private, perhaps that's ok but you have to register its existence and pay taxes on it as a data asset, much like we pay taxes on land. That way we can wield the tax code to encourage companies that operate in the clear. This category would probably resemble patent law quite a bit, except ownership doesn't come by default, you have to buy your property rights from the public (since by owning that thing, you're depriving the masses of access to it, and since the notion that it is a peg that fits in a property shaped hole is a fiction that requires some work on our part to maintain).
This is alleged, and it is very likely that claimants like New York Times accidentally prompt injected their own material to show the violation (not understanding how LLMs really work), and clouded in the hope of a big pay day rather than actual justice/fairness etc...
Anyways, the laws are mature enough for everyone to work this out in court. Maybe it comes out that they have a legitimate concern, but the way they presented their evidence so far in public has seriously been lacking.
Prompt injecting their own article would indeed be an incredible show of incompetence by the New York Times. I'm confident that they're not so dumb that they put their article in their prompt and were astonished when the reply could reproduce the prompt.
Rather, the actual culprit is almost certainly overfitting. The articles in question were pasted many times on different websites, showing up in the training data repeatedly. Enough of this leads to memorization.
They hired a third party to make the case, and we know nothing about that party except that they were lawyers. It is entirely possible, since this happened very early in the LLM game, that they didn’t realize how the tech worked, and fed it enough of their own article for the model to piece it back together. OpenAI talks about the challenge of overfitting, and how they work to avoid it.
The goal is to end up with a model capable of discovering all the knowledge on its own. not rely on what humans produced before. Human knowledge contains errors, I want the model to point out those errors and fix them.
the current state is a crutch at best to get over the current low capability of the models.
Or rather, I have an unending stream of callers with similar-sounding voices who all want to make chirpy persuasive arguments in favor of Mr Altman's interests.
These models literally need ALL data. The amount of work it would take just to account for all the copyrights, let alone negotiate and compensate the creators, would be infeasible.
I think it’s likely that the justice system will deem model training as fair use, provided that the models are not designed to exactly reproduce the training data as output.
I think you hit on an important point though: these models are a giant transfer of wealth from creators to consumers / users. Now anyone can acquire artist-grade art for any purpose, basically for free — that’s a huge boon for the consumer / user.
People all around the world are going to be enriched by these models. Anyone in the world will be able to have access to a tutor in their language who can teach them anything. Again, that is only possible because the models eat ALL the data.
Another important point: original artwork has been made almost completely obsolete by this technology. The deed is done, because even if you push it out 70 years, eventually all of the artwork that these models have been trained on will be public domain. So, 70 years from now (or whatever it is) the cat will be out of the bag AND free of copyright obligations, so 2-3 generations from now it will be impossible to make a living selling artwork. It’s done.
When something becomes obsolete, it’s a dead man walking. It will not survive, even if it may take a while for people to catch up. Like when the vacuum tube computer was invented, that was it for relay computers. Done. And when the transistor was invented, that was it for vacuum tube computers.
It’s just a matter of time before all of today’s data is public domain and the models just do what they do.
> The amount of work it would take just to account for all the copyrights, let alone negotiate and compensate the creators, would be infeasible.
Your argument is the same as Facebook saying “we can’t provide this service without invading your privacy” or another company saying “we can’t make this product without using cancerous materials”.
Tough luck, then. You don’t have the right to shit on and harm everyone else just because you’re a greedy asshole who wants all the money and is unwilling to come up with solutions to problems caused by your business model.
This is bigger than the greed of any group of people. This is a technological sea change that is going to displace and obsolesce certain kinds of work no matter where the money goes. Even if open models win where no single entity or group makes a large pile of money, STILL the follow-on effects from wide access to models trained on all public data will unfold.
People who try to prevent models from training on all available data will simply lose to people who don’t, and eventually the maximally-trained models will proliferate. There’s no stopping it.
Assume a world where models proliferate that are trained on all publicly-accessible data. Whatever those models can do for free, humans will have a hard time charging money for.
That’s the sea change. Whoever happens to make money through that sea change is a sub-plot of the sea change, not the cause of it.
If you want to make money in this new environment, you basically have to produce or do things that models cannot. That’s the sink or swim line.
If most people start drowning then governments will be forced to tax whoever isn’t drowning and implement UBI.
>Tough luck, then. You don’t have the right to shit on and harm everyone else just because you’re a greedy asshole who wants all the money
It used to be that property rights extended all the way to the sky. This understanding was updated with the advent of the airplane. Would a world where airlines need to negotiate with every land-owner their planes fly above be better than ours? Would commercial flight even be possible in such a world? Also, who is greediest in this scenario, the airline hoping to make a profit, or the land-owners hoping to make a profit?
Your comment seems unfair to me. We can say the exact same thing for the artist / IP creator:
Tough luck, then. You don’t have the right to shit on and harm everyone else just because you’re a greedy asshole who wants all the money and is unwilling to come up with solutions to problems caused by your business model.
Once the IP is on the internet, you can't complain about a human or a machine learning from it. You made your IP available on the internet. Now, you can't stop humanity benefiting from it.
Talk about victim blaming. That’s not how intellectual property or copyright work. You’re conveniently ignoring all the paywalled and pirated content OpenAI trained on.
First, “Plaintiffs ACCUSE the generative AI company.” Let’s not assume OpenAI is guilty just yet. Second, assuming OpenAI didn’t access the books illegally, my point still remains. If you write a book, can you really complain about a human (or in my humble opinion, a machine) learning from it?
There's zero doubt that people will still create art. Almost no one will be paid to do it though (relative to our current situation where there are already far more unpaid artists than paid ones). We'll lose an immeasurable amount of amazing new art that "would have been" as a result, and in its place we'll get increasingly bland/derivative AI generated content.
Much of the art humans will create entirely for free in whatever spare time they can manage after their regular "for pay" work will be training data for future AI, but it will be extremely hard for humans to find as it will be drowned out by the endless stream of AI generated art that will also be the bulk of what AI finds and learns from.
AI will just be another tool that artists will use.
However the issue is that it will be much harder to make a career in the digital world from an artistic gift and personal style: one's style will not be unique for long as AI will quickly copy it and so make the original much less valuable.
AI will certainly be a tool that artists use, but non-artists will use it too so very few will ever have the need to pay an artist for their work. The only work artists are likely to get will be cleaning up AI output, and I doubt they'll find that to be very fulfilling or that it pays them well enough to make a living.
When it's harder to make a career in the digital world (where most of the art is), it's more likely that many artists will never get the opportunity to fully develop their artistic gifts and personal style at all.
If artists are lucky then maybe in a few generations with fewer new creative works being created, AI almost entirely training on AI generated art will mean that the output will only get more generic and simplistic over time. Perhaps some people will eventually pay humans again for art that's better quality and different.
The prevalence of these lines of thought make me wonder if we'd see a similar backlash against Star-Trek style food-replicators. "Free food machines are being be used by greedy corporations to put artisanal chefs out of business. We must outlaw the free food machines."
I'll gladly put money on music that a human has poured blood, sweat, tears and emotion into. Streaming has already killed profits from album sales so live gigs is where the money is at and I don't see how AI could replace that.
Lol, you really want content creators to aid AI in replacing them without any compensation? Would you also willingly train devs to do your job after you've been laid off, for free?
What nonsense. Just because doing the right thing is hard, or inconvenient doesn't mean you get to just ignore it. The only way I'd be ok with this is if literally the entire human population were equal shareholders. I suspect you wouldn't be ok with that little bit of communism.
There is no way on Earth that people playing by the existing rules of copyright law will be able to compete going forward.
You can bluster and scream and shout "Nonsense" all you want, but that's how it's going to be. Copyright is finished. When good models are illegal or unaffordable, only outlaws -- meaning hostile state-level actors with no allegiance to copyright law -- will have good models.
We might as well start thinking about how the new order is going to unfold, and how it can be shaped to improve all of our lives in the long run.
I think there’s no stopping this train. Whoever doesn’t train on all available data will simply not produce the models that people actually use, because there will be people out there who do train models on all available data. And as I said in another comment, after some number of decades all of the content that has been used to train current models will be in the public domain anyway. So it will only be a few generations before this whole discussion is moot and the models are out there that can do everything today’s models can, unencumbered by any copyright issues. Digital content creation has been made mostly obsolete by generative AI, except for where consumers actively seek out human-made content because that’s their taste, or if there’s something humans can produce that models cannot. It’s just a matter of time before this all unfolds. So yes, anyone publishing digital media on the internet is contributing to the eventual collapse of people earning money to produce content that models can produce. It’s done. Even if copyright delays it by some decades, eventually all of today’s medial will be public domain and THEN it will be done. There are 0 odds of any other outcome.
To your last point, I think the best case scenario is open source/weight models win so nobody owns them.
> We've designed society to give rewards to people who produce things of value
Is that really what copyright does though? I would be all for some arrangement to reward valuable contributions, but the way copyright goes about allocating that reward is by removing the right of everyone but the copyright holder to use information or share a cultural artifact. Making it illegal to, say, incorporate a bar you found inspiring into a song you make and share, or to tell and distribute stories about some characters that you connected with, is profoundly anti-human.
I'm shocked at how otherwise normally "progressive" folks or even so called "communists" will start to bend over for IP-laws the moment that they start to realize the implications of AI systems. Glad to know that accusations of the "gnulag" were unfounded I guess!
I now don't believe most "creative" types when they try to spout radical egalitarian ideologies. They don't mean it at all, and even my own family, who religiously watched radical techno-optimist shows like Star Trek, are now falling into the depths of ludditism and running into the arms of defending copyright trolls
If you're egalitarian, it makes sense to protest when copyright is abolished only for the rich corporations but not for actual people, don't you think? Part of the injustice here is that you can't get access to windows source code, or you can't use Disney characters, or copy most copyrighted material... But OpenAI and github and whatnot can just siphon all data with impunity. Double standard.
Copyright has been abolished for the little guy. I’m talking about AI safety doomers who think huggingface and Civit.AI are somehow not the ultimate good guys in the AI world.
This is a foul mischaracterization of several different viewpoints. Being opposed to a century-long copyright period for Mickey Mouse does not invalidate support for the concept of IP in general, and for the legal system continuing to respect the licensing terms of very lenient licenses such as CC-BY-SA.
I wonder how long until we see a product that's able to record workstation displays and provide a conversational analysis of work conducted that day by all of your employees.
> Instinctively, I dislike a robot that pretends to be a real human being.
Is that because you're not used to it? Honestly asking.
This is probably the first time it feels natural where as all our previous experiences make "chat bots" and "automated phone systems", "automated assistants" absolutely terrible.
Naturally, we dislike it because "it's not human". But this is true of pretty much any thing that approaches "uncanny valley". But, if the "it's not human" solves your answer 100% better/faster than the human counter part, we tend to accept it a lot faster.
This is the first real contender. Siri was the "glimpse" and ChatGPT is probably the reality.
[EDIT]
https://vimeo.com/945587328 the Khan academy demo is nuts. The inflections are so good. It's pretty much right there in the uncanny valley because it does still feel like you're talking to a robot but it also directly interacting with it. Crazy stuff.
> It speaks in customer service voice. That faux friendly tone people use when they're trying to sell you something.
Mmmmm while I get that, in the context w/ the grandparent comment, having a human wouldn't be better then? It's effectively the same. Because, realistically that's a pretty common voice/tone to get even in tech support.
The problem is you don't like the customer service/sales voice because they "pretend to be your friends".
Let me know if I didn't capture it.
I don't think people "pretend to be my friend" when they answer the phone to help me sort out of airline ticket problem. I do believe they're trained to and work to take on a "friendly" tone. Even if the motive isn't genuine, because it's trained, it's way a nicer of an experience than someone who's angry or even simply monotone. Trying to fix my $1200 plane ticket is stressful enough. Don't need the CSR to make it worse.
Might be cultural, but I would prefer a neutral tone. The friendly tone gives some expectation of good result of the inquiry or of implication, which makes it worse when the problem is not solvable or not in the power of agent to solve - which many times it is - you don't call support for simple problems.
Of course I agree that "angry" is in most cases not appropriate, but still, I can see cases in which it might, for example, if the caller is really aggressive, curses, or blames unreasonably the agent, the agent could become angry. Training people that everybody will answer them "friendly" no matter their behavior does not sound good for me.
Being human doesn't make it worse. Saccharine phonies are corny when things are going well and dispiriting when they're meant to be helping you and fail.
I wonder if you can ask it to change its inflections to match a personal conversation as if you're talking to a friend or a teacher or in your case... a British person?
This is where Morgan Freeman can clean up with royalty payments. Who doesn’t want Ellis Boyd Redding describing ducks and math problems in kind and patient terms?
> This is probably the first time it feels natural
Really? I found this demo painful to watch and literally felt that "cringe" feeling. I showed it to my partner and she couldn't even stand to hear more than a sentence of the conversation before walking away.
It felt both staged and still frustrating to listen to.
And, like far too much in AI right now, a demo that will likely not pan out in practice.
Emotions are an axiom to convey feelings, but also our sensitivity to human emotions can be a vector for manipulation.
Especially when you consider the bottom line that this tech will be ultimately be horned into advertising somehow (read: the field dedicated to manipulating you into buying shit).
> Emotions are an axiom to convey feelings, but also our sensitivity to human emotions can be a vector for manipulation.
When one gets to be a certain age one begins to become attuned to this tendency of others' emotions to manipulate you, so you take steps to not let that happen. You're not ignoring their emotions, but you can address the underlying issue more effectively if you're not emotionally charged. It's a useful skill that more people would benefit from learning earlier in life. Perhaps AI will accelerate that particular skill development, which would be a net benefit to society.
> When one gets to be a certain age one begins to become attuned to this tendency of others' emotions to manipulate you
This is incredibly optimistic, which I love, but my own experience with my utterly deranged elder family, made insane by TV, contradicts this. Every day they're furious about some new things fox news has decided it's time to be angry about: white people being replaced (thanks for introducing them to that, tucker!), "stolen" elections, Mexicans, Muslims, the gays, teaching kids about slavery, the trans, you name it.
I know nobody else in my life more emotionally manipulated on a day to day basis than them. I imagine I can't be alone in watching this happen to my family.
What if this technology could be applied so you can’t be manipulated? If we are already seeing people use this to simulate and train sales people to deal with tough prospects we can squint our eyes a bit and see this being used to help people identify logical fallacies and con men.
Great replacement and white genocide are white nationalist far-right conspiracy theories. If you believe this is happening, you are the intellectual equivalent of a flat-earther. Should we pay attention to flat-earthers? Are their opinions on astronomy, rocketry, climate, and other sciences worth anyone's time? Should we give them a platform?
> In the words of scholar Andrew Fergus Wilson, whereas the islamophobic Great Replacement theory can be distinguished from the parallel antisemitic white genocide conspiracy theory, "they share the same terms of reference and both are ideologically aligned with the so-called '14 words' of David Lane ["We must secure the existence of our people and a future for white children"]." In 2021, the Anti-Defamation League wrote that "since many white supremacists, particularly those in the United States, blame Jews for non-white immigration to the U.S.", the Great Replacement theory has been increasingly associated with antisemitism and conflated with the white genocide conspiracy theory. Scholar Kathleen Belew has argued that the Great Replacement theory "allows an opportunism in selecting enemies", but "also follows the central motivating logic, which is to protect the thing on the inside [i.e. the preservation and birth rate of the white race], regardless of the enemy on the outside."
> and not wanting your children to be groomed into cutting off their body parts.
This doesn't happen. In fact, the only form of gender-affirming surgery that any doctor will perform on under-18 year olds is male gender affirming surgery on overweight boys to remove their manboobs.
> You are definitely sane and your entire family is definitely insane.
You sound brave, why don't you tell us what your username means :) You're one to stand by your values, after all, aren't you?
Well, when you inquire someone why they don't want to have more children, they can shrug and say "population reduction is good for the climate" as ig serving the greater good, and completely disregard any sense of "patriotic duty" to have more children like some politicians such as Vladimir Putin, would like to instill. They can justify it just as easily as you can be derranged enough to call it a governemnt conspiracy.
With AI you can do A/B testing (or multi-arm bandits, the technique doesn't matter) to get into someone's mind.
Most manipulators end up getting bored of trying again and again with the same person. That won't happen if you are a dealing with a machine, as it can change names, techniques, contexts, tones, etc. until you give it what its operator wants.
Maybe you're part of the X% who will never give in to a machine. But keep in mind that most people have no critical thinking skills nor mental fortitude.
Problem is, people aren't machines either: someone who's getting bombarded with phishing requests will begin to lose it, and will be more likely to just turn off their Wi-Fi than allow an AI to run a hundred iterations of a many-armed-bandit approach on them.
I think we often get better at detecting the underlying emotion with which the person is communicating, seeing beyond the one they are trying to communicate in an attempt to manipulate us. For example, they say that $100 is their final price but we can sense in the wavering of their voice that they might feel really worried that they will lose the deal. I don't think this will help us pick up on those cues because there are no underlying real emotions happening, maybe even feeding us many false impressions and making us worse at gauging underlying emotions.
> Especially when you consider the bottom line that this tech will be ultimately be horned into advertising somehow.
Tools and the weaponization of them.
This can be said of pretty much any tech tool that has the ability to touch a good portion of the population, including programming languages themselves, CRISPR?
I agree we have to be careful of the bad, but the downsides in this case are not so dangerous that we should be trying to suppress it because the benefits can be incredible too.
The concern is that it's being locked up inside of major corporations that aren't the slightest bit trustworthy. To make this safe for the public, people need to be able to run it on their own hardware and make their own versions of it that suit their needs rather than those of a megacorp.
this tech isn't slowing down and our generation maybe hesitate at first but remember this field progressing at astonishing speeds like we are literally 1 generation away
Why can’t it also inspire you? If I can forgo advertising and have ChatGPT tutor my child on geometry and they actually learn it at a fraction of the cost of a human tutor why is that bothersome? Honest question. Why do some many people default to something sinister going on. If this technology shows real efficacy in education at scale take my money.
Because it is obviously going to be used to manipulate people. There is absolutely 0 doubt about that (and if there is I'd love to hear your reasoning). The fact that it will be used to teach geometry is great. But how many good things does a technology need to do before the emotional manipulation becomes worth it?
I don't think OpenAI is doing anything particularly sinister. But whatever OpenAI has today a bad actor will have in October. This horseshit is moving rather fast. Sorry, but in two years going from failing the turing test to being able to have a conversation with an AI agent nearly indistinguishable from a person is going to be destabilizing.
AI is going to be fantastic at teaching skills to students that those students may never need, since the AI will be able to do all the work that requires such skills, and do them faster, cheaper and at a higher level of quality.
These sorts of comments are going to go in the annals with the hackernews people complaining about Dropbox when it first came out. This is so revolutionary. If you're not agog you're just missing the obvious.
Something can be revolutionary and have hideous flaws.
(Arguably, all things revolutionary do.)
I'm personally not very happy about this for a variety of reasons; nor am I saying AI is incapable of changing the entire human condition within our lifetimes. I do claim that we have little reason to believe we're headed in a more-utopian direction with AI.
I think pets often feel real emotions, or at least bodily sensations, and communicate those to humans in a very real way, whether thru barking or meowing or whimpering or whatnot. So while we may care for them as we care for a human, just as we may care for a plant or a car as a human, I think if my car started to say it felt excited for me to give it a drive, I might also feel uncomfortable.
They do, but they've evolved neoteny (baby-like cries) to do it, and some of their emotions aren't "human" even though they are really feeling them.
Silly example, but some pets like guinea pigs are almost always hungry and they're famous for learning to squeak at you whenever you open the fridge or do anything that might lead to giving them bell peppers. It's not something you'd put up with a human family member using their communication skills to do!
There’s definitely an element of evolution: domesticated animals have evolved to have human recognizable emotions. But that’s not to say they’re not “real” or even “human.” Do humans have a monopoly on joy? I think not. Watch a dog chase a ball. It clearly feels what we call joy in a very real sense.
Adult dogs tend to retain many of the characteristics that wolf puppies have, but grow out of when they become adults.
We've passively bred out many of the behaviors that lead to wolves becoming socially mature. Such dogs tend to be too dangerous to have around, since they may lead to the dogs challenging their owners (more than they already do) for dominance of the family.
AI's will probably be designed to do the same thing, so they will not feel threatening to us. But in the case of AGI/ASI, we will never know if they actually have this kind of subservience, or if they're just faking it for as long as it benefits them.
Good thing you can tell the AI to speak to you in a robotic monotone and even drop IQ if you feel the need to speak with a dumb bot. Or abstain from using the service completely. You have choices. Use them.
Until your ISP fires their entire service department in a foolish attempt to "replace" them with an overfunded chatbot-service-department-as-a-service and you have to try to jailbreak your way through it to get to a human.
But I think this animosity is very much expected, no? Even I felt a momentary hint of "jealousy" -- if you can even call it that -- when I realized that we humans are, in a sense, not really so special anymore.
But of course this was the age-old debate with our favorite golden-eyed android; and unsurprisingly, he too received the same sort of animosity:
Bones was deeply skeptical when he first met Data: "I don't see no points on your ears, boy, but you sound like a Vulcan." And we all know how much he loved those green-blooded fools.
Likewise, Dr. Pulanski has since been criticized for her rude and dismissive attitudes towards Data that had flavors of what might even be considered "racism," or so goes the Trekverse discussion on the topic.
And let's of course not forget when he was on trial essentially for "humanity," or whether hew as indeed just the property of Starfleet, and nothing more.
More recent incarnations of Star Trek: Picard illustrated the outright ban on "synthetics" and indeed their effective banishment; non-synthetic life -- from human to Roman -- simply weren't ok with them.
Yes this is all science fiction silliness -- or adoration depending on your point of view -- but I think it very much reflects the myriad directions our real life world is going to scatter (shatter?) in the coming years ahead.
To your point, there's been a lot of talk about AI, regulation, guardrails, whatever. Now is the time to say, AI must speak such that we know it's AI and not a real human voice.
We get the upside of conversation, and avoid the downside of falling asleep at the wheel (as Ethan Mollick mentions in "Co-Intelligence".)
Exactly. I'm not sure if this is brand new or not, but this is definitely on the frontier.
I was literally just thinking about this a few days ago... that we need a multi-modal language model with speech training built-in.
As soon as this thing rolls out, we'll be talking to language models like we talk to each other. Previously it was like dictating a letter and waiting for the responding letter to be read to you. Communication is possible, but not really in the way that we do it with humans.
This is MUCH more human-like, with the ability to interrupt each other and glean context clues from the full richness of the audio.
The model's ability to sing is really fascinating. It's ability to change the sound of its voice -- its pacing, its pitch, its tonality. I don't know how they're controlling all that via GPT-4o tokens, but this is much more interesting stuff than what we had before.
I honestly don't fully understand the implications here.
> Humans are more partial to talking than writing.
Amazon, Google, and Apple have sunk literally billions of dollars into this idea only to find out that, no, we aren't.
We are with other humans, yes. When socialization is part of the conversation. When I'm talking to my local barista I'm not just ordering a coffee, I'm also maintaining a relationship with someone in my community.
But when it comes to work, writing >>> talking. Writing is clarity of ideas. Talking is cult of personality.
And when it comes to inputs/outputs, typing is more precise and more efficient.
Don't get me wrong, this is an incredibly revolutionary piece of technology, but I don't think the benefits of talking you're describing (timing, subtext, inexplicit knowledge) are achievable here either (for now), since even that requires HOURS of interaction over days/weeks/months of experiences for humans to achieve with each other.
I use voice assistants and find them quite useful, but I've had to learn the interface and memorise the correct trigger phrases. If GPT-4o works half as well in practice as it does in the demos, then it's categorically a different thing.
I don't think they've sunk $1 into that idea. They've sunk billions into a different idea: that people enjoy using their vocal cords more than their hands to compose messages to send to each other. That is not a spoken conversation, it's just correspondence with voice input/output options.
Writing is only superior to conversation when weighed against discussions with more than 3 people. A quick call with one or two other people always results in more progress being made as long as everyone involved wants to get it done. Messaging back and forth takes much more time and often leads to misunderstandings.
I wouldn't say speaking is mostly for short exchanges of information. Sometimes it's the opposite: my wife will text me for simple comments or requests, but for anything complicated she'll put the phone to her ear and call me. Or coworkers often want to set up a meeting rather than exchange a series of asynchronous emails -- iteration, brainstorming, Q&A, and the like can be more agile with voice than it can with text.
I’m 100% a text everything never calls person but I can’t live without Alexa these days, every time I’m in a hotel or on vacation I nearly ask a question out loud.
I also hate how much Alexa sucks so this is a big deal. I spent years weeding out what it could do and can’t do so it will be nice to have one that I don’t have to treat like a toddler
I started using the Pi LLM app (by Inflection.ai) with my kids about six months ago and was completely blown away by how human-like it sounded, not just the voice itself but the way it expresses itself, the tiny pauses and hesitations, the human-like imperfections. It does feel like conversing with another human -- I've never seen another LLM do that.
(We mostly use it in car trips -- great for keeping the kids (ages 8, 12) occupied with endless Harry Potter trivia questions, answers to science questions, etc.)
Indeed, the 2013 Spike Jonze movie is the first idea that popped-up to my mind when I saw those videos
amazing to see this movie 10 years after it was released in the light of those "futuristic" tools (AI assistant and such)
Yeah it's the worst. And 'um' doesn't seem to work, you actually need convincing filler words. It feels like being forced to speak under duress.
I've long felt that embracing the concept of the 'prompt' was a terrible idea for Siri and all the other crappy voice assistants. They built ecosystems on top of this dumb reduction, which only engineers could have made: that _talking to someone_ is basically taking turns to compose a series of verbal audio snippets in a certain order.
The previous ChatAI app was getting pretty good once you learned the difference between run on sentences or breaking it up enough.
The tonality and inflections in the voice are a little too good.
Most people put on a spectrum/average aren't that good at speaking and communicating and that stands out as an uncanny valley approach. It is mindbogglingly good at it though.
im human and much much more partial to typing than talking. talking is a lot of work for me and i can't process my thinking well at all without writing.
I don't think that's generally true, other than for socializing with other humans.
Note how people, now having a choice, prefer to text each other most of the time rather than voice call.
I don't think people sitting at work in their cube farm want to be talking to their computer either. The main use for voice would seem to be for occasional use talking to an assistant on a smartphone.
Maybe things will change in the future when we get to full human AGI level, treating the AGI as an equal, more as a person.
When I was working at the IBM Speech group circa 1999 as a contractor on an embedded speech system (IBM Personal Speech Assistant), I discussed with Raimo Bakis (a researcher there then) this issue of such metadata and how it might improve conversational speech recognition. It turned out that IBM ViaVoice detected some of that metadata (like pitch/tone as a reflection of emotion) -- but then on purpose threw it away rather than using it for anything. Back then it was so much harder to get speech recognition to do anything useful -- beyond limited transcripts of audio with ~5% error rates that was good enough mainly for searching -- that perhaps doing that made sense. Very interesting to see such metadata in use now both in speech recognition and in speech generation.
More on the IBM Personal Speech Assistant for which I am on a patent (since expired) by Liam Comerford:
http://liamcomerford.com/alphamodels3.html
"The Personal Speech Assistant was a project aimed at bringing the spoken language user interface into the capabilities of hand held devices. David Nahamoo called a meeting among interested Research professionals, who decided that a PDA was the best existing target. I asked David to give me the Project Leader position, and he did. On this project I designed and wrote the Conversational Interface Manager and the initial set of user interface behaviors. I led the User Interface Design work, set specifications and approved the Industrial Design effort and managed the team of local and offsite hardware and software contractors. With the support of David Frank I interfaced it to a PC based Palm Pilot emulator. David wrote the Palm Pilot applications and the PPOS extensions and tools needed to support input from an external process. Later, I worked with IBM Vimercati (Italy) to build several generations of processor cards for attachment to Palm Pilots. Paul Fernhout, translated (and improved) my Python based interface manager into C and ported it to the Vimercati coprocessor cards. Jan Sedivy's group in the Czech Republic Ported the IBM speech recognizer to the coprocessor card. Paul, David and I collaborated on tools and refining the device operation. I worked with the IBM Design Center (under Bob Steinbugler) to produce an industrial design. I ran acoustic performance tests on the candidate speakers and microphones using the initial plastic models they produced, and then farmed the design out to Insync Designs to reduce it to a manufacturable form. Insync had never made a functioning prototype so I worked closely with them on Physical UI and assemblability issues. Their work was outstanding. By the end of the project I had assembled and distributed nearly 100 of these devices. These were given to senior management and to sales personnel."
Thanks for the fun/educational/interesting times, Liam!
As a bonus for that work, I had been offered one of the chessboards that been used when IBM Deep Blue defeated Garry Kasparov, but I turned it down as I did not want a symbol around of AI defeating humanity.
Twenty-five years later, how far that aspiration towards conversational speech with computers has come. Some ideas I've put together to help deal with the fallout:
https://pdfernhout.net/beyond-a-jobless-recovery-knol.html
"This article explores the issue of a "Jobless Recovery" mainly from a heterodox economic perspective. It emphasizes the implications of ideas by Marshall Brain and others that improvements in robotics, automation, design, and voluntary social networks are fundamentally changing the structure of the economic landscape. It outlines towards the end four major alternatives to mainstream economic practice (a basic income, a gift economy, stronger local subsistence economies, and resource-based planning). These alternatives could be used in combination to address what, even as far back as 1964, has been described as a breaking "income-through-jobs link". This link between jobs and income is breaking because of the declining value of most paid human labor relative to capital investments in automation and better design. Or, as is now the case, the value of paid human labor like at some newspapers or universities is also declining relative to the output of voluntary social networks such as for digital content production (like represented by this document). It is suggested that we will need to fundamentally reevaluate our economic theories and practices to adjust to these new realities emerging from exponential trends in technology and society."
Another idea for dealing with the consequences is using AI to facilitate Dialogue Mapping with IBIS for meetings to help small groups of people collaborate better on "wicked problems" like dealing with AI's pros and cons (like in this 2019 talk I gave at IBM's Cognitive Systems Institute Group).
https://twitter.com/sumalaika/status/1153279423938007040
I wouldn't call out the depression bit as a Gen Z exclusive. Millennials basically invented modern, every day, gallows humor. Arguably, they're also the ones to normalize going to therapy. Not to say that things aren't bad, just saying that part didn't start with Gen Z.
>Millennials basically invented modern, every day, gallows humor
lmao what.... they absolutely didn't
this is why no one should take anyone on this site seriously about anything, confidentally incorrect, easily conned into the next VC funded marketing project
Suicidal humor is very much a Millennial trait. They weren't the first to make those jokes but they definitely made it bigger, more common, and went beyond the standard "ugh, just kill me now" you'd hear before.
Older people think younger people are stupid and reckless, and viceversa. And the younglings think they "figured it out" like no one before them. But no one ever tried to understand each other in the process. Rinse and repeat.
These demos show people talking to artificial intelligence. This is new. Humans are more partial to talking than writing. When people talk to each other (in person or over low-latency audio) there's a rich metadata channel of tone and timing, subtext, inexplicit knowledge. These videos seem to show the AI using this kind of metadata, in both input and output, and the conversation even flows reasonably well at times. I think this changes things a lot.