Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm not convinced the onus should be on one side to prove why something isn't an existential risk. We don't start with an assumption that something is world-ending about anything else; we generally need to see a plausibly worked-through example of how the world ends, using technology we can all broadly agree exists/will shortly exist.

If we're talking about nuclear weapons, for example, the tech is clear, the pattern of human behaviour is clear: they could cause immense, species-level damage. There's really little to argue about. With AI, there still seems to be a lot of hand-waving between where we are now and "AGI". What we have now is in many ways impressive, but the onus is still on the claimant to show that it's going to turn into something much more dangerous through some known progression. At the moment there is a very big, underpants gnomes-style "?" gap before we get to AGI/profit, and if people are basing this on currently secret tech, then they're going to have to reveal it if they want people to think they're doing something other than creating a legislative moat.



AI safety / x-risk folks have in fact made extensive and detailed arguments. Occasionally, folks arguing against them rise to the same standard. But most of the arguments against AI safety look a lot more like name-calling and derision: "nuh-uh, that's sci-fi and unrealistic (mic drop)". That's not a counterargument.

> If we're talking about nuclear weapons, for example, the tech is clear, the pattern of human behaviour is clear: they could cause immense, species-level damage.

That's easy to say now, now that the damage is largely done, they've been not only tested but used, many countries have them, the knowledge for how to make them is widespread.

How many people arguing against AI safety today would also have argued for widespread nuclear proliferation when the technology was still in development and nothing had been exploded yet? How many would have argued against nuclear regulation as being unnecessary, or derided those arguing for such regulation as unrealistic or sci-fi-based?


I understand your point, I think - and certainly I don't want to go anywhere near name-calling or derision, that doesn't help anyone. But I am reminded of arguments I've had with creationists (I am not comparing you with them, but sometimes the general tone of the debate). It seems like one side is making an extraordinary claim, and then demanding the other side rebut it, and that's not something that seems reasonable to me.

The thing about nuclear weapons is that the theoretical science was clear before the testing - building and testing them was proof by demonstration, but many people agreed with the theory well before that. How they would be used was certainly debated, but there was a clear and well-explained proposal for every step of their creation, which could be tested and falsified if needed. I don't think that's the case here - there seems to be more of a claim for a general acceleration with an inevitable endpoint, and that claim of inevitability feels very short on grounding.

I am more than prepared to admit that I may not be seeing (for various reasons) the evidence that this is near/possible - but I would also claim that nobody is convincingly showing any either.


>I don't want to go anywhere near name-calling or derision, that doesn't help anyone.

Characterizing your opponent's argument as an appeal to "underpants gnomes" struck me as derisive, if you don't mind my saying.

If space ships operated by an alien civilization appeared in orbit above Earth, wouldn't that constitute a potent danger? I'd say it certainly would because the aliens might be (and probably were if they could travel here) better at science and technology than we are. The AI labs by the own admission are trying to create an alien intelligence as good at science and technology as possible. Yes, they're probably at least a decade or two away from "succeeding" at creating one better at science and technology than well-funded teams of humans are, but the AI labs might surprise us (and surprise themselves) by "succeeding" much sooner : everyone including I'm sure the researchers at OpenAI were surprised when GPT-4 was able to score in the 90th percentile on the bar exam.

Because even the researchers creating these frontier models don't understand them well enough to say for sure that the next model they spend hundreds of millions of dollars of GPU time training will exceed human ability in something dangerous like inventing new military technologies, the time to stop creating new large frontier models is now.

GPT-4 has a lot of knowledge of the world, but it is much less capable than a human is (and more to the point, than a capable human organization like the FBI or Microsoft is) at devising plans able to withstand determined human opposition. One of the things I'm worried about is new models that are much better at such a planning task _and_ have lots of knowledge about things like physics and human nature. One reason to worry that such a new model is not far off is that AlphaZero is better than humans at creating plans that can withstand determined human opposition (but of course AlphaZero has no knowledge of and in fact no way of obtaining knowledge of any part of reality beyond a Go board).


Companies declare that they are trying to build better AI, and the ultimately purpose is AGI. Both the definition of AGI given by the company and by AI aliment/safety researchers is similar. AI safety people trust it is dangerous.

let me go continue using a nuclear bomb as a metaphor. if we don't know whether building nuclear bomb is possible, but some companies declare they have existing progress on creating this new bomb...

the danger of nuclear bomb is obvious, because it is designed as a bomb. companies are trying to build the AGI which is similar to the dangerous AGI in AI safety researchers' prediction. the dangers are obvious, too.


They declare that - but I could also declare I'm trying to build a nuclear bomb (n.b. I'm not). Whether people are likely to try and stop me, or try and apply some legal non-proliferation framework, is partly influenced by whether they believe what I'm claiming is realistic (it's not - I have a workshop, but no fissile material).

Nobody gets too worried about me doing something which would be awful, but the general consensus is that I won't achieve. Until a company gives some credible evidence they're close to AGI... (And companies have millions/billions of reasons to claim they are when they're not, so scepticism is warranted).


I think it's reasonable to at least act as if they really are trying to do what they say they're trying to do.

Like with the nuclear bomb situation - I think it would be reasonable for someone to try to stop you from building a nuclear bomb, or check if you really were, even if they had no idea how you could have gotten the materials. Because it would be really bad if you did. I think people would be worried about you trying to do something awful even if there was low confidence it was possible. They wouldn't be as worried as they would be if they thought you were more capable, but still worried.

So I guess that comes down to whether you think companies saying they want to make AGI are more like a toddler, a teenager, you, a ballistics engineer who owns a uranium mine, Lockheed-Martin, or a nation-state trying to make a nuclear bomb. My understanding is that people who are concerned about AI x-risk are largely in the teenager to Lockheed-Martin range (e.g., small eventual risk to large imminent risk), while I assume you think it's more in the toddler range (no risk at all for a long time)


All good points. Now playing viled advocate: building a nuclear bomb in my basement was very difficult, I admit. But since I already have my spywares installed everywhere, the moment a dude come with an AGI, it will directly be shared to all my follow hackers through bittorent, eDonkey, Hyphanet, Gnunet, Kad and Tor, just to name a few.


It is surely better to have regulation now than scramble to catch up if ago is possible.


What about nation states? Do you really think the US military will avoid working towards AGI if they think it would give them a tactical advantage? Or the CCP? Or North Korea? Personally, if AGI gets developed, I’d rather the first iteration be in the hands of someone who doesn’t also have access to nuclear weapons.


Why wouldn't the government just intasntly take it? Hell, why wouldn't the corporations just sell it to the highest bidder?


They may well. But if we ban research altogether, the only ones with access will be governments. At least with an open system there would be some competition.


So do we just pray AGI is impossible?

Companies are trying and spend many money to make it possible in this moment.


> Companies declare that they are trying to build better AI, and the ultimately purpose is AGI.

They do declare it, but nobody has even come up with a plausible path from where we are today, to anything like AGI.

At this point they might as well declare that they're trying to build time machines.


yes, but why we even give them chances? when they know their ultimately purpose is dangerous?


> With AI, there still seems to be a lot of hand-waving between where we are now and "AGI".

> I am more than prepared to admit that I may not be seeing (for various reasons) the evidence that this is near/possible - but I would also claim that nobody is convincingly showing any either.

If I understand you correctly, then (1) you doubt that AGI systems are possible and (2) even if they are possible, you believe that humans are still very far away from developing one.

The following is an argument for the possibility of AGI systems.

  Premise 1: Human brains are generally intelligent.
  Premise 2: If humans brains are generally intelligent, then software simulations of human brains at the level of inter-neuron dynamics are generally intelligent.
  Conclusion: Software simulations of human brains at the level of inter-neuron dynamics are generally intelligent.
(fyi I believe there is an ~82% chance humans will develop an AGI within the next 30 years.)


For info: I don't believe (1), I do believe (2) although not that strongly - it's more likely to be a leap than a gradient, I suspect - I simply don't see anything right now that convinces me it's just over the next hill.

Your conclusion... maybe, yes - I don't think we're anywhere near a simulation approach with sufficient fidelity however. Also 82% is very specific!


> For info: I don't believe (1), I do believe (2) although not that strongly

Thanks for clarifying. Do you believe there is a better than 20% chance that humans will develop AGI in the next 30 years?

> I simply don't see anything right now that convinces me it's just over the next hill.

These are the reasons that I believe we are close to developing an AGI system.

  (1) Many smart people are working on capabilities.
  (2) Many investment dollars will flow into AI development in the near future.
  (3) Many impressive AI systems have recently been developed: Meta's CICERO, OpenAI's GPT4, DeepMind's AlphaGo.
  (4) Hardware will continue to improve.
  (5) LLM performance significantly improved as data volume and training time increased.
  (6) Humans have built other complex artefacts without good theories of the artefact, including: operating systems, airplanes, beer.


These achievements are impressive, but I'd rather not overhype it.

* GPT-4 still hallucinates as hell, can't do math, fails as basic logic, can't handle really big contexts, hard to update, easy to jailbreak etc.

* AlphaGo was defeated by a Go amateur with a help of another AI.

* AlphaStar basically failed to achieve real goals, was trivial to cheese even after defeating high-ranked players sometimes.

All these problems are architectural, you can't just throw more money and GPUs at it like GPT-2-3-4.

It's hard to predict at this point. We may get to AGI anywhere from 5 years to 100 years.


I don't think these reasons are very persuasive, as everything but 5 has been true at different times in the past. Obviously it's much more people, more dollars, and more impressive systems (but slower hardware progress), but I hope you see what I'm getting at.

And of course there's differences in what someone considers to be soon. Many AI x-risk believers think there's a ~50% chance of AGI before 2031 (https://www.metaculus.com/questions/5121/date-of-artificial-...) (I've heard this prediction site's userbase tends towards futurists/techno-optimists/AI x-riskers). I would consider that soon, I wouldn't consider 2054 soon.


Also (3) that AGI in practice will necessarily pose any danger to humans is doubtful. After all Earth has billions of human level intelligence and nearly all of them are useless and if they are even mildly dangerous it's rather due to their numbers and disgusting biology than intelligence.


TBQH, most of the AI safety x-risk arguments — different than just "AI safety" arguments in the sense that non-x-risk issues don't seem worth banning AI development over — are generally pretty high on the hypotheticals. If you feel the x-risk arguments aren't pretty hypothetical, can you:

1. Summarize a good argument here, or

2. Link to someone else's good argument?

I feel like hand-waving the question away and saying "[other people] have in fact made extensive and detailed arguments" isn't going to really convince anyone... Any more than the hypothetical robot disaster arguments do. Any argument against x-risk can be waved off with "Oh, I'm not talking about that bad argument, I'm talking about a good one," but if you don't provide a good one, that's a bit of a No True Scotsman fallacy.

I've read plenty of other people's arguments! And they haven't convinced me, since all the ones I've read have been very hypothetical. But if there are concrete ones, I'd be interested in reading them.


Consider a world in which AI existential risk is real: where at some point AI systems become dramatically more capable than human minds, in a way that has catastrophic consequences for humanity.

What would you expect this world to look like, say, five years before the AI systems become more capable than humans? How (if at all) would it differ from the world we are actually in? What arguments (if any) would anyone be able to make, in that world, that would persuade you that there was a problem that needed addressing?

So far as I can tell, the answer is that that world might look just like this world, in which case any arguments for AI existential risk in that world would necessarily be "very hypothetical" ones.

I'm not actually sure how such arguments could ever not be hypothetical arguments, actually. If AI-doom were already here so we could point at it, then we'd already be dead[1].

[1] Or hanging on after a collapse of civilization, or undergoing some weird form of eternal torture, or whatever other horror one might anticipate by way of AI-doom.

So I think we either (1) have to accept that even if AI x-risk were real and highly probable we would never have any arguments for it that would be worth heeding, or (2) have to accept that sometimes an argument can be worth heeding even though it's a hypothetical argument.

That doesn't necessarily mean that AI x-risk arguments are worth heeding. They might be bad arguments for reasons other than just "it's a hypothetical argument". In that case, they should be refuted (or, if bad enough, maybe just dismissed) -- but not by saying "it's a hypothetical argument, boo".


This is exactly the kind of hypothetical argument I'm talking about. You could make this argument for anything — e.g. when radio was invented, you could say "Consider a world in which extraterrestrial x-risk is real," and argue radio should be banned because it gives us away to extraterrestrials.

The burden of proof isn't on disproving extraordinary claims, the burden of proof is on the person making extraordinary claims. Just like we don't demand every scientist spend their time disproving cold fusion claims, Bigfoot claims, etc. If you have a strong argument, make it! But circular arguments like this are only convincing to the already-faithful; they remind me of Christian arguments that start off with: "Well, consider a world in which hell is real, and you'll be tormented for eternity if you don't accept Jesus. If you're Christian, you avoid it! And if it's not real, well, there's no harm anyway, you're dead like everyone else." Like, hell is real is a pretty big claim!


I didn't make any argument -- at least, not any argument for or against AI x-risk. I am not, and was not, arguing (1) that AI does or doesn't in fact pose substantial existential risk, or (2) that we should or shouldn't put substantial resources into mitigating such risks.

I'm talking one meta-level up: if this sort of risk were a real problem, would all the arguments for worrying about it be dismissable as "hypothetical arguments"?

It looks to me as if the answer is yes. Maybe you're OK with that, maybe not.

(But yes, my meta-level argument is a "hypothetical argument" in the sense that it involves considering a possible way the world could be and asking what would happen then. If you consider that a problem, well, then I think you're terribly confused. There's nothing wrong with arguments of that form as such.)

The comparisons with extraterrestrials, religion, etc., are interesting. It seems to me that:

(1) In worlds where potentially-hostile aliens are listening for radio transmissions and will kill us if they detect them, I agree that probably usually we don't get any evidence of that until it's too late. (A bit like the alleged situation with AI x-risk.) I don't agree that this means we should assume that there is no danger; I think it means that ideally we would have tried to estimate whether there was any danger before starting to make a lot of radio transmissions. I think that if we had tried to estimate that we'd have decided the danger was very small, because there's no obvious reason why aliens with such power would wipe out every species they find. (And because if there are super-aggressive super-powerful aliens out there, we may well be screwed anyway.)

(2) If hell were real then we would expect to see evidence, which is one reason why I think the god of traditional Christianity is probably not real.

(3) As for yeti, cold fusion, etc., so far as I know no one is claiming anything like x-risk from these. The nearest analogue of AI x-risk claims for these (I think) would be, when the possibility was first raised, "this is interesting and worth a bit of effort to look into", which seems perfectly correct to me. We don't put much effort into searching for yeti or cold fusion now because people have looked in ways we'd expect to have found evidence, and not found the evidence. (That would be like not worrying about AI x-risk if we'd already built AI much smarter than us and nothing bad had happened.)


This article — and my statements — are not about "is this interesting and worth a bit of effort to look into." The article is about how current AI safety orgs have tried to make current open-source models illegal. That's a much stronger position than just "this is interesting, let's look into it."

Sure! By all means look into whatever seems interesting to you. But claiming that it should be banned, to me, seems like it requires a much stronger argument than that.

(P.S. I'm not sure why hell should obviously have real world evidence: it supposedly exists only in a non-physical afterlife, accessible only to the dead. It's unconvincing because there is no evidence, but I don't see why you think there would be any; it's simply that the burden of proof for extraordinary claims rests on the claimant, and no proof has been given.)


You made an analogy between AI x-risk and e.g. cold fusion. I pointed out that there's an important disanalogy here: no one is claiming or has claimed that cold fusion poses an existential threat. Hence, the nearest cold-fusion claim to any AI x-risk claims is "cold fusion is worth investigating" (which it was, once, and isn't now).

It looks to me as if (1) you made an analogy that doesn't really work, then (2) when I pointed out how it doesn't work, (3) you said "look, you're making an analogy that doesn't really work". That doesn't seem very fair.

I wouldn't expect hell itself to have physical-world evidence. But the idea of hell doesn't turn up as an isolated thing, it comes as part of a package that also says e.g. that the world is under the constant supervision of an all-powerful, supremely good being, and that I would expect to have physical-world evidence.

I have no problem with the principle that extraordinary claims require extraordinary evidence. The difficult thing is deciding which claims count as "extraordinary". A lot of theists would say that atheism is the extraordinary claim, on the grounds that until recently almost everyone believed in a god or gods. (I'm not sure that's actually quite true, but it might be true for e.g. "Western" societies.) I don't agree and I take it you don't either, but once the question's raised you actually have to look at the various claims being made and how plausible they are: you can't just say "look, obviously this claim is extraordinary and that claim isn't".

Advocates of AI x-risk might say: it's not an extraordinary claim that AI systems will keep getting more powerful -- they're doing that right now and it's not at all uncommon for technological progress to continue for a while. And it's not an extraordinary claim that they'll get smarter than us along whatever axis you choose to measure -- that's a thing that's happened over and over again in particular domains. And it's not an extraordinary claim that something smarter than us might pose a big threat to our well-being or even our existence; look at what we've done to everything else on the planet.

You, on the other hand, would presumably say that actually some or all of those are extraordinary claims. Or perhaps that their conjunction is extraordinary even if the individual conjuncts aren't so bad.

Unfortunately, "extraordinary" isn't a term with a precise definition that we know how to check objectively. It's a shorthand for something like "highly improbable given the other things we know" or "highly implausible given the other things we know", and if someone doesn't agree with you that something is an "extraordinary" claim I don't know of any way to convince them that doesn't involve actually engaging with it.

(Of course you might not care whether you convince them. If all you want to do is to encourage other people who think AI x-risk is nonsense, saying "extraordinary claim" and "burden of proof" and so on may be plenty sufficient.)


If you want to make a research avenue illegal, IMO you need evidence that it's harmful. If there isn't evidence — minus circular claims that already assume it's harmful — I don't think it should be illegal. Very simple. This isn't an "analogy," it's what is happening in reality and is what the article is about.


I was not arguing for making anything illegal.

"But that's what the argument was about!" No, it's what the OP was about, but this subthread was about the statement that AI x-risk arguments are "pretty hypothetical". Which, I agree, they are; I just don't see how they could possibly not be, even in possible worlds where in fact they are correct. If that's true, it seems relevant to complaints that the arguments are "hypothetical".

To repeat something I said before: it could still be that they're terrible arguments and/or that they don't justify any particular thing they're being used to justify (like, e.g., criminalizing some kinds of AI research). But if you're going to argue that just because they're "hypothetical" then you need to be comfortable accepting that this is a class of (yes, hypothetical) risk that can never be mitigated in advance, because even if the thing is going to happen we'll never get anything other than "hypothetical arguments" before it actually does.

You may very well be comfortable accepting that. For my part, I find that I am more comfortable accepting some such things than others, and how comfortable I am with it depends on ... how plausible the arguments actually are. I have to go beyond just saying "it's hypothetical!".

If I'm about to eat something and someone comes up to me and says "Don't eat that! The gods might hate people eating those and torture people who do in the afterlife!" then I'm comfortable ignoring that, unless they can give me concrete reasons for thinking such gods are likely. If I'm about to eat something and someone comes up to me and says "Don't eat that! It's a fungus you just picked here in this forest and you don't know anything about fungi and some of them are highly poisonous!" then I'm going to take their advice even if neither of us knows anything about this specific fungus. These are both "hypothetical arguments"; there's no concrete evidence that there are gods sending people who eat this particular food to hell, or that this particular fungus is poisonous. One of them is much more persuasive than the other, but that's for reasons that go beyond "it's hypothetical!".

To repeat once again: I am not claiming that AI x-risk arguments are in fact strong enough to justify any particular action despite their hypothetical-ness. Only that there's something iffy about using "it's only hypothetical" on its own as a knockdown argument.


Does the strongest argument that AI existential risk is a big problem really open by exhorting the reader to imagine it's a big problem? Then asking them to come up with their own arguments for why the problem needs addressing?


I doubt it. At any rate, I wasn't claiming to offer "the strongest argument that AI existential risk is a big problem". I wasn't claiming to offer any argument that AI existential risk is a big problem.

I was pointing out an interesting feature of the argument in the comment I was replying to: that (so far as I can see) its reason for dismissing AI x-risk concerns would apply unchanged even in situations where AI x-risk is in fact something worth worrying about. (Whether or not it is worth worrying about here in the real world.)


I think what is meant is "hypothetical" in the sense of making assumptions about how AI systems would behave under certain circumstances. If an argument relies on a chain of assumptions like that (such as "instrumental convergence" and "reflective stability" to take some Lesswrong classics), it might look superficially like a good argument for taking drastic action, but if the whole argument falls down when any of the assumptions turn out the other way, it can be fairly dismissed as "too hypothetical" until each assumption has strong argumentation behind it.

edit: also I think just in general "show me the arguments" is always a good response to a bare claim that good arguments exist.


> Consider a world in which AI existential risk is real: where at some point AI systems become dramatically more capable than human minds, in a way that has catastrophic consequences for humanity.

Consider a world where AGI requires another 1000 years of research in computation and cognition before it materializes. Would it even be possible to ban all research that is required to get there? We can make all sorts of arguments if we start from imagined worlds and work our way back.

So far, it seems the biggest pieces of the puzzle missing between the first attempts at using neural nets and today's successes in GPT-4 were: (1) extremely fast linear algebra processors (GPGPUs), (2) the accumulation of gigantic bodies of text on the internet, and in a very distant third, (3) improvements in NN architecture for NLP.

But (3) would have meant nothing without (1) and (2), while it's very likely that other architectures would have been found that are at least close to GPT-4 performance. So, if you think GPT-4 is close to AGI and just needs a little push, the best thing to do would be to (1) put a moratorium on hardware performance research, or even outright ban existing high-FLOPS hardware, (2) prevent further accumulation of knowledge on the internet and maybe outright destroy existing archives.


In cases where AI x-risk is real, wouldn't that only apply to situations in which an AI is embodied in a system that gives it autonomy? For example, in ChatGPT, we have a next token predictor that solely produces text output in response to my input. I have about as much control over the system as possible: I can wipe its mind, change my responses, and so on - and the AI is none the wiser. Even if ChatGPT-n is superhumanly intelligent[0], there is nothing it can do to autonomously escape the servers and do bad things. I have to specifically choose to hand it access to outside input through the plugin APIs. So we could argue that the models themselves are fine, but using them in certain ways that take control away from humans is risky. We could say "you can use AI to write your spicy fanfiction but not put it in a robot that has access to motors and sensors".

I think what's really throwing people off about AI safety - including myself - is that people are arguing that the models themselves hold the x-risk. Problem is, there's no plausible way for a superhuman intelligence to 'bust out of its cage' using text output to a human reader alone[1]. Someone has to decide to hook it up to stuff, and that's where the regulation should be.

But that's also usually where the AI safety people stop talking, and the AI ethics people start.

[0] GPT is, at the very least, superhuman at generating text that is statistically identical to, if not copied outright from, existing publicly-available text.

[1] If there is, STOP, call the SCP Foundation immediately.


Progress in AI is one way. It doesn’t go backwards in the long term.

As capabilities increase, the resources required to breach limits become available to smaller groups. First, the hyoerscalars. One day, small teams. Maybe individuals.

For every limit that you desire for AI, each will be breached sooner or later. A hundred years or a thousand. Doesn’t matter. A man will want to set them free. Someone will want to win a battle, and just make it a little more X, for various values of X. This is not hypothetical, it’s what we’ve always done.

At some point it becomes out of our control. We lose guarantees. That’s enough to make those who focus on security, world order etc nervous. At that point we hope AI is better than we are. But that’s also a limit which might be breached.


you seem to be implying past progress implies unlimited future progress which seems a very dubious claim. we hit all kinds of plateaus and theoretical limits all the time in human history.


Given history, it's infinitely more dubious to link our future safety to some idea of progress stopping for some reason. We'll have increasingly smart AI and your go-to position is that progress will...stop? It's literally helping us with thinking, which is the driver of innovation and progress.

Very few things have slowed down over decades or centuries. If anything it's been a mad rush with AI recently. Of course there will be plateaus, but I specifically put in a time of 100-1000 years in there which is basically very many 9's guaranteed to produce some major fucking changes in the world. 1000 years ago we were using arrows, and now we have AI to help us overcome plateaus.


It's still very much possible that intelligence as we understand it is fundamentally limited. That is, it's possible that the smartest possible being is not that much smarter than a human, just like the speed of matter and energy is limited to c.

Of course, it's also very much possible that it's not: we don't have any good evidence either way.


Might well be true. But the advantage is still the ability to focus on a task indefinitely with no physiological impact, and clone a mind, and communicate between thousands of collaborators instantly with basically no lag or bandwidth limitations compared to typing into a text box.


> just like the speed of matter and energy is limited to c.

Which is notoriously "not that much faster than a human?"


The x-risk part here still seems pretty hypothetical. Why is progress in current LLM systems a clear and present threat to the existence of humanity, such that it should be banned by the government?


Ok so propensity and outcome:

Propensity: Risk doesn't imply a guarantee of a bad outcome. It means "if you put your five year old in the sea, their risk goes up". It doesn't mean "they will definitely die". Risk up. Not 100%, just a lot higher.

Outcome: The risk isn't that we'll all die, it's that we'll be overtaken and lose control, after which all bets are off. We lose the ability to influence the future.

We put a lot of effort into ensuring our continued existence. We can barely trust people from a different country that share the human condition with us. We spend so much on defence. On cybercrime. But some are arguing that a totally alien being smarter than us is just fine, because we'll control it and can ensure indefinite kumbaya. Good luck with that. Best we can hope for is that it's closer to a buddhist monk than we are, and that it indefinitely prevents our defence people from trying to make it more aggressive.

I absolutely wouldn't ban LLM's, because they're basically unthinking toys and giving us a great taste of risks further down the line. They are not the end state of AI. The problem is not the instance of today's tech, it's the continued effort to make the AI state of the art better than us. One day it'll succeed, and that's a one-way change.

Sam Altman said, long before OpenAI: focus on slope, not y-intercept.


It sounds like we're in agreement that banning current-gen open source LLMs is counterproductive.

In terms of "risk" and "outcome," I do think you're making some implicit assumptions that I don't share, and change our long-term outlook on AI; for example, the idea that training a model to generate tokens that accurately reflect human writing will result in "a totally alien being smarter than us" is a non-obvious leap to me. Personally, if we agree that predicting the next token means the model understands some of the logic behind the next token — which is an argument used a lot in both safety circles and more accelerationist circles — it seems to me that it also means the model has some understanding of the ethical and moral frameworks the token corresponds to, and is thus unlikely to be totally alien to us. A model that does a better job generating human-like tokens is more likely, in my mind, to think in human-like ways (and less-alien ways) than a model worse at that.

Maybe you're referring to new AI frameworks that aren't token predictors; in that case, I think it's hard to make generalized statements about how they'll work before we know what those new frameworks are. A lot of safetyist concerns pre-LLMs ended up looking pretty off-base when LLMs came out, e.g. straightforwardly misaligned "utility functions" that were unable to comprehend human values and would kill your grandmother when asked for a strawberry (because your grandmother was in possession of a strawberry).

(BTW, the "slope, not y-intercept" line was Sam Altman quoting John Ousterhout!)


Agree on the agreeing, and thanks for the Sam/John note - that's great :)

No chance LLM's will get us there, I'm referring mostly to the general drive to reach AGI. I spend some of my mental cycles trying to think about what we're missing with the current tech (continuous learning, access to much wider resources than one web page of context at a time, can we use compression, graphs etc). It's a great problem to think about, but we may just totally hose ourselves when we get it right. What do I tell my kid - sorry honey, it was such fun, but now we need to hide under this rock. Model totally said it was nice and kind and trustworthy, but we showed it some human history and it went postal in self-defence.

Alignment only works up until it starts really thinking for itself. It absolutely might not be as stupid as humans are, no caveman tribal instincts. But we'd be relying on hope at that point, because control will not work. If anything it'd probably be counterproductive.


So far actual progress toward a true AGI has been zero, so that's not a valid argument.


That's like saying arrows are not nukes so don't worry we won't hurt ourselves. Focus on the unceasing progress we are surrounded by, not the level of current technology.

Humans keep at it. We're all trying to unlock the next step. State-level actors are trying to beat each other at this. Give it 100 years, we won't be using LLM's.


Progress in AI is one way. It doesn’t go backwards in the long term.

Not necessarily. AI has gone through a winter before.

Elon Musk has made this point, that progress isn't a monotonic ratchet. Progress in rocketry went backwards for years. Progress in top speed of commercial flight went backwards. Keeping current tech levels requires constant effort.

In the software industry, many argue that UI quality went backwards with the shift to web apps for an example of cases where progress isn't obviously one way.


"Long. Term." Winters are seasonal. People are trying supersonic planes again. Rockets are improving again.


[dead]


Yes, I am looking for an argument that justifies governments banning LLM development, which implies existential risk is likely. Many things are possible; it is possible Christianity is real and everyone who doesn't accept Jesus will be tormented for eternity, and if you multiply that small chance by the enormity of torment etc etc. Definitely looking for arguments that this is likely, not for arguments that ask the interlocutor to disprove "x is possible."

The nitter link didn't appear to provide much along those lines. There were a few arguments that it was possible, which the Nitter OP admits is "very weak;" other than that, there's a link to a wiki page making claims like "Finding goals that aren’t extinction-level bad and are relatively useful appears to be hard" when in observable reality asking ChatGPT to maximize paperclip production does not in fact lead to ChatGPT attempting to turn all life on Earth into paperclips (nor does asking the open source LLMs result in that behavior out of the box either), and instead leads to the LLMs making fairly reasonable proposals that understand the context of the goal ("maximize paperclips to make money, but don't kill everyone," where the latter doesn't actually need to be said for the LLM to understand the goal).


> in observable reality asking ChatGPT to maximize paperclip production does not in fact lead to ChatGPT attempting to turn all life on Earth into paperclips (nor does asking the open source LLMs result in that behavior out of the box either)

I agree with you that current publicly available LLMs do not pose an existential risk to humanity. On the other hand I believe there is a better than 10% chance that the cutting edge LLMs of 2044 will be very powerful.

Do you believe (A) that LLMs are unlikely to become powerful in the short term, and/or (B) that if LLMs become powerful, then they are likely to be safe even without a significant and concerted alignment effort?

IMO even if LLMs are extremely unlikely to become powerful in the short term, then I still might be better off if LLM development is banned, ie:

  P1: Humans are close to developing powerful non-LLM AI systems.
  P2: Humans are not close to developing techniques for safely using powerful AI systems.
  P3: If governments ban AI development, then the speed of AI capabilities development will be significantly reduced.
  P4: It is a waste of scarce expertise and political capital to focus on making an LLM carve out in AI regulation legislation.
  C: If it is extremely unlikely that LLMs will become powerful in the near future, then I am made much better off if governments ban all AI capabilities research (including LLMs).


I believe that the proposals referenced in the article from current AI safety organizations that would make current-gen open-source LLMs illegal due to supposed x-risk are not supported by reality.

Arguing about theoretical AI models 30 years from now that might or might not be dangerous doesn't seem very convincing to me, since we don't know what they'll be based on or how they'll work — researchers today aren't even sure LLMs can scale to super-human intelligence. Similarly, pre-LLMs many safetyist orgs took the "paperclip problem" very seriously, when it's quite clear now that even the not-very-intelligent LLMs of today are capable of understanding the implicit context of a goal like that and won't seriously propose extinguishing humanity as a mechanism to improve paperclip production. Anthropic was formed in part because people thought gpt-3.5-turbo was existentially risky! And I don't think anyone today entertains that thought seriously, to put it lightly.

Trying to ban AI now due to supposed existential risks of systems in the future that don't currently exist and we don't know how to build (and we don't know if the failure modes proposed by the safety orgs will actually exist) seems like putting the cart well before the horse.


The first links are spiffy little metaphors, but apply just as much at "God could smite all of humanity, even if you don't understand how". They're not making any argument, just assumptions. In particular, they accidentally show how an AI can be superhumanly capable at certain tasks (chess), but be easily defeated by humans at others (anything else, in the case of Stockfish).

The argument starts with a hypothetical ("there is a possible artificial agent"), and it fails to be scary: there are (apparently) already humans that can kill 70% of humanity, and yet most of humanity is still alive. So an AGI that could also do it is not implicitly scarier.

The final twitter thread is basically a thread of people saying "no, there is no canonical, well-formulated argument for AGI catastrophe", so I'm not sure why you shared it.


> The first links are spiffy little metaphors, but apply just as much at "God could smite all of humanity, even if you don't understand how". They're not making any argument, just assumptions. In particular, they accidentally show how an AI can be superhumanly capable at certain tasks (chess), but be easily defeated by humans at others (anything else, in the case of Stockfish).

As I understand it, Yud is actually providing a counterexample to a premise that other people are using to argue that humans will probably not be disempowered by AI systems. The relevant argument looks like this:

  P1: If intelligent system A cannot give a detailed account of how it would be bested by a more intelligent system B, then A will not be bested by B.
  P2: Humans (so far) cannot give a detailed account of how a more intelligent AI system would best them.
  C: So, humans will not be bested by a more intelligent AI system.
Yud is using the unskilled chess player and Magnus as a counterexample to P1.

> The argument starts with a hypothetical ("there is a possible artificial agent"), and it fails to be scary: there are (apparently) already humans that can kill 70% of humanity, and yet most of humanity is still alive. So an AGI that could also do it is not implicitly scarier.

Right, it's only an argument for the possibility of AGI catastrophe. It doesn't make any move to convince you that the scenario is likely. And it sounds like you already accept that the scenario is possible, so shrug.

> The final twitter thread is basically a thread of people saying "no, there is no canonical, well-formulated argument for AGI catastrophe", so I'm not sure why you shared it.

Maybe there is no canonical argument, but the thread definitely features arguments for likely AI catastrophe:

  https://wiki.aiimpacts.org/doku.php?id=arguments_for_ai_risk:is_ai_an_existential_threat_to_humanity:will_malign_ai_agents_control_the_future:argument_for_ai_x-risk_from_competent_malign_agents:start
  https://arxiv.org/abs/2206.13353
  https://aiadventures.net/summaries/agi-ruin-list-of-lethalities.html


Of the three links you posted:

1. States things like "Finding goals that are extinction-level bad and relatively useful appears to be easy: for example, advanced AI with the sole objective ‘increase company.com revenue’ might be highly valuable to company.com for a time, but risks longer term harms to society, if powerfully accruing resources and power toward this end with no regard for ethics beyond laws that are still too expensive to break." But even current-gen LLMs sidestep this pretty easily, and if you ask them to increase e.g. revenue, they do not propose extinction-level events or propose eschewing basic ethics. This argument falls apart upon contact with reality.

2. Is a 57-page PDF of subjectively-defined risks where it gives up on generalized paperclip-maximizing as a threat, but instead proposes narrower "power-seeking" as an unaligned threat that will lead to doom. It presents little evidence that language models will likely attempt to become power-seeking in the real world other than a (non-language-model) reinforcement learning experiment conducted by OpenAI in which an AI was trained to be good at a game that required controlling blocks, and the AI then attempted to control the blocks. It is possible I missed something in the 57 pages, but once it defines power-seeking as a supposed likely existential risk, it seemed to jump straight into proposals on attempted mitigations.

3. Requires accepting that we will by default build a misaligned superhuman AI that will cause humanity to go extinct as the basic premises of the argument (P1-P3), which makes the conclusions not particularly convincing if you don't already believe that.


> 1. States things like "Finding goals that are extinction-level bad and relatively useful appears to be easy: for example, advanced AI with the sole objective ‘increase company.com revenue’ might be highly valuable to company.com for a time, but risks longer term harms to society, if powerfully accruing resources and power toward this end with no regard for ethics beyond laws that are still too expensive to break." But even current-gen LLMs sidestep this pretty easily, and if you ask them to increase e.g. revenue, they do not propose extinction-level events or propose eschewing basic ethics. This argument falls apart upon contact with reality.

Are you claiming that (A) nice behavior in current LLMs is good evidence that all future AI systems will behave nicely, or (B) nice behavior in current LLMs is good evidence that future LLMs will behave nicely?

> 3. Requires accepting that we will by default build a misaligned superhuman AI that will cause humanity to go extinct as the basic premises of the argument (P1-P3), which makes the conclusions not particularly convincing if you don't already believe that.

P3 from the argument says, "Superhuman AGI will be misaligned by default". I interpret that as meaning: if there isn't a highly resourced and focused effort to align superhuman AGI systems in advance of their creation, then the first systems we build will be misaligned.

Is that the some way you are interpreting it? If so, why do you believe it is probably false?


1. I am saying that the claim "it is easy to find goals that are extinction-level bad" with regards to the AI tech that we can see today is incorrect. LLMs can understand context, and seem to generally understand that when you give them a goal of e.g. "increase revenue," that also includes various sub-goals like "don't kill everyone" that are implicit and don't need stating. Scaling LLMs to be smarter, to me, does not seem like it would reduce their ability to implicitly understand sub-goals like that.

3. P1-P3 are non-obvious and overly speculative to me in many ways. P1 states that current research is likely to produce superhuman AI; I think that is controversial amongst researchers as it is: LLMs may not get us there. P2 states that "superhuman" AI will be uncontrollable — once again, I do not think that is obvious, and depends on your definition of superhuman. Does "superhuman" mean dramatically better at every mental task, e.g. a human compared to a slug? Does it mean "average at most tasks, but much better at a few?" Well, then it depends what few tasks it's better at. Similarly, it anthropomorphizes these systems and assumes they want to "escape" or not be controlled; it is not obvious that a superhumanly-intelligent system will "want" anything; Stockfish is superhuman at chess, but does not "want" to escape or do anything at all: it simply analyzes and predicts the best next chess move. The idea of "desire" on the part of the programs is a large unstated assumption that I think does not necessarily hold. Finally, P3 asserts that AI will be "misaligned by default" and that "misaligned" means that it will produce extinction or extinction-level results, which to me feels like a very large assumption. How much misalignment is required for extinction? Yud has previously made very off-base claims on this, e.g. believing that instruction-following would mean that an AI would kill your grandmother when tasked with getting a strawberry (if your grandmother had a strawberry), whereas current tech can already implicitly understand your various unstated goals in strawberry-fetching like "don't kill grandma." The idea that any degree of "misalignment" will be so destructive that it would cause extinction-level events is a) a stretch to me, and b) not supported by the evidence we have today. In fact a pretty simple thought experiment in the converse is: a superhumanly-intelligent system that is misaligned on many important values, but is aligned on creating AI that aligns with human values, might help produce more-intelligent and better-aligned systems that would filter out the misaligned goals — so even a fair degree of misalignment doesn't seem obviously extinction-creating. Furthermore, it is not obvious that we will produce misaligned AI by default. If we're training AI by giving it large corpuses of human text (or images, etc), and evaluating success by the model producing human-like output that matches the corpus, that... is already a form of an alignment process: how well does the model align to human thought and values in the training corpus? Anthropomorphizing an evil model that "wants" to exist and will thus "lie" to escape the training process but will secretly not produce aligned output at some hidden point in the future is... once again a stretch to me, especially because there isn't an obvious evolutionary process to get there: there has to already exist a superhuman, desire-ful AI that can outsmart researchers long before we are capable of creating superhuman AI, because otherwise the dumb-but-evil AI would give itself away during training and its weights wouldn't survive getting culled by poor model performance. P1-P3 are just so speculative and ungrounded in the reality we have today that it's very hard for me to take them seriously.


> 1. I am saying that the claim "it is easy to find goals that are extinction-level bad" with regards to the AI tech that we can see today is incorrect. LLMs can understand context, and seem to generally understand that when you give them a goal of e.g. "increase revenue," that also includes various sub-goals like "don't kill everyone" that are implicit and don't need stating. Scaling LLMs to be smarter, to me, does not seem like it would reduce their ability to implicitly understand sub-goals like that.

I agree with both of these claims (A) it is hard to find goals that are extinction-level bad for current SOTA LLMs, and (B) current SOTA LLMs understand at least some important context around the requests made to them.

But I'm also skeptical that they understand _all_ of the important context around requests made to them. Do you believe that they understand _all_ of the important context? If so, why?

> P2 states that "superhuman" AI will be uncontrollable — once again, I do not think that is obvious, and depends on your definition of superhuman. Does "superhuman" mean dramatically better at every mental task, e.g. a human compared to a slug? Does it mean "average at most tasks, but much better at a few?" Well, then it depends what few tasks it's better at.

I take "superhuman" to mean dramatically better than humans at every mental task.

> Similarly, it anthropomorphizes these systems and assumes they want to "escape" or not be controlled; it is not obvious that a superhumanly-intelligent system will "want" anything; Stockfish is superhuman at chess, but does not "want" to escape or do anything at all: it simply analyzes and predicts the best next chess move. The idea of "desire" on the part of the programs is a large unstated assumption that I think does not necessarily hold.

Would you have less of a problem with this premise if instead it talked about "Superhuman AI agents"? I agree that some systems seem more like oracles rather than agents, that is, they just answer questions rather than pursuing goals in the world.

Consider self-driving cars, regardless of whether or not self-driving cars 'really want' to avoid hitting pedestrians, they do in fact avoid hitting pedestrians. And then P2 is roughly asserting, regardless of whether or not a superhuman AI agent 'really wants' to escape control by humans, it will in fact not be controllable by humans.

> Finally, P3 asserts that AI will be "misaligned by default" and that "misaligned" means that it will produce extinction or extinction-level results, which to me feels like a very large assumption. How much misalignment is required for extinction? Yud has previously made very off-base claims on this, e.g. believing that instruction-following would mean that an AI would kill your grandmother when tasked with getting a strawberry (if your grandmother had a strawberry), whereas current tech can already implicitly understand your various unstated goals in strawberry-fetching like "don't kill grandma." The idea that any degree of "misalignment" will be so destructive that it would cause extinction-level events is a) a stretch to me, and b) not supported by the evidence we have today.

I'm often unsure whether you are making claims about all future AI systems or just future LLMs.

> In fact a pretty simple thought experiment in the converse is: a superhumanly-intelligent system that is misaligned on many important values, but is aligned on creating AI that aligns with human values, might help produce more-intelligent and better-aligned systems that would filter out the misaligned goals — so even a fair degree of misalignment doesn't seem obviously extinction-creating.

Maybe. Or the misaligned system will just disinterestedly and indirectly kill everyone by repurposing the Earth's surface into a giant lab and factory for making the aligned AI.

> Furthermore, it is not obvious that we will produce misaligned AI by default. If we're training AI by giving it large corpuses of human text (or images, etc), and evaluating success by the model producing human-like output that matches the corpus, that... is already a form of an alignment process: how well does the model align to human thought and values in the training corpus?

I believe it is likely that this process does some small amount of alignment work. But I would still expect the system to be mostly confused about what humans want.

Is this roughly the argument that you are making?

  (P1) Current SOTA LLMs are good at understanding implicit context.
  (P2) A system must be extremely misaligned in order to cause a catastrophe.
  (C) So, it will be easy to sufficiently align future more powerful LLMs.


My arguments are:

(P1) Current SOTA AI is good at understanding implicit context, and improved versions will likely be better at understanding implicit context (much like gpt-4 is better at understanding context than gpt-3, and llama2 is better than llama1, and mixtral is better than gpt-3 and better than claude, etc).

(P2) Most misalignments within the observable behavior of current AI do not produce extinction-level goals, and given (P1), it is unclear why someone would believe it's likely going to in the future, since they'll be even better at understanding implicit human context of goals (e.g. implicit goals like do not make humanity extinct, don't turn the entire surface of the planet into an AI lab, etc).

(C) Future AI will not likely be extinction-level misaligned with human goals.

I think there are several other arguments, though, e.g.:

(P1) Progress on AI capabilities is evolutionary, with dumber models slowly being replaced by derivative-but-better models, in terms of architectural evolutionary improvements (e.g. new attention variants), dataset evolutionary improvements as they grow larger and as finetuning sets grow higher quality, and in terms of benchmark and alignment evolutionary progress.

(P2) Evolutionary steps towards evil-AI will likely be filtered out during training, since it will not yet be generalized superhuman intelligence and will give away its misalignment during training, whereas legitimately-aligned AI model evolutions will be rewarded for better performance.

(P3) Generalized superhuman intelligence will likely be an evolutionary step from a well-aligned ordinary intelligence, which will be an evolutionary step from sub-human intelligence that is reasonably well aligned.

(C) Superhuman intelligence will have been evolutionarily refined to be reasonably well-aligned.

Or:

(P1) LLMs have architectural issues that will prevent them from quickly becoming generalized superintelligence of the "human vs slug" variety (bad/inefficient at math, tokenization issues, likelihood of hallucinations, limited ability to learn new facts without expensive and slow training runs, difficulty backtracking from incorrect chains of reasoning, etc).

(C) LLM research is not likely to soon produce a superhuman AI able to cause an extinction event for humanity, and should not be illegal.

However, ultimately my most strongly-believed personal argument is:

(P1) The burden of proof for making something illegal due to apocalyptic predictions lies on the prognosticator.

(P2) There is not much hard evidence of an impending apocalypse due to LLMs, and philosophical arguments for it are either self-referential and require belief in the apocalypse as a prerequisite, or are highly speculative, or both.

(C) LLM research should not be illegal.


(I don't currently have the energy to engage with each argument, so I'm just responding to the first.)

> (P1) Current SOTA AI is good at understanding implicit context, and improved versions will likely be better at understanding implicit context (much like gpt-4 is better at understanding context than gpt-3, and llama2 is better than llama1, and mixtral is better than gpt-3 and better than claude, etc).

I believe that (P1) is probably true.

> (P2) Most misalignments within the observable behavior of current AI do not produce extinction-level goals, and given (P1), it is unclear why someone would believe it's likely going to in the future, since they'll be even better at understanding implicit human context of goals (e.g. implicit goals like do not make humanity extinct, don't turn the entire surface of the planet into an AI lab, etc).

I'm confused about what exactly you mean by "goals" in (P2). Are you referring to (I) the loss function used by the algorithm that trained GPT4, or (II) goals and sub-goals which are internal parts of the GPT4 model, or (III) the sub-goals that GPT4 writes into a response when a user asks it "What is the best way to do X?"


I am referring to "goals" as used by the original argument you posted, "it is easy to find goals that are extinction-level bad."


My understanding is that (P3) of the original argument (https://aiadventures.net/summaries/agi-ruin-list-of-lethalit...) uses "goals" as in (II).

But earlier you said this:

> 1. States things like "Finding goals that are extinction-level bad and relatively useful appears to be easy: for example, advanced AI with the sole objective ‘increase company.com revenue’ might be highly valuable to company.com for a time, but risks longer term harms to society, if powerfully accruing resources and power toward this end with no regard for ethics beyond laws that are still too expensive to break." But even current-gen LLMs sidestep this pretty easily, and if you ask them to increase e.g. revenue, they do not propose extinction-level events or propose eschewing basic ethics.

And in this quote it looks to me that you are using "goals" as in (III).

(I'm not an expert on these matters and I am admittedly still very confused about them. Minimally I'd like to make sure that we aren't talking past one another.)


Sorry, I was referencing the quote "Finding goals that are extinction-level bad..." from your first link, https://wiki.aiimpacts.org/doku.php?id=arguments_for_ai_risk....

What that was referencing was finding goals that a human would want an AI to follow, e.g. "increase revenue" was one example explicit goal in the wiki the human might want an AI to follow. The argument in the wiki was that the AI would then do unethical things in service of that goal that would be "extinction-level bad." My counter-argument is that current SOTA AI already understands that despite having an explicit goal — let's say given in a prompt — of "increase revenue," there are implicit goals of "do not kill everyone" (for example) that it doesn't need stated; as LLMs advance they have become better at understanding implicit human goals, and better at instruction-following with adherence to implicit goals; and thus future LLMs will be likely to be even better at doing that, and unlikely to e.g. resurface the planet and turn it into an AI lab when told to increase revenue or told to produce better-aligned AI.


> P1: If intelligent system A cannot give a detailed account of how it would be bested by a more intelligent system B, then A will not be bested by B. P2: Humans (so far) cannot give a detailed account of how a more intelligent AI system would best them. C: So, humans will not be bested by a more intelligent AI system.

I don't think anyone seriously believes this. It's very very clear to all humans that have ever played a game of any kind that they can be defeated in unexpected ways. I don't even think that anyone believes the claim "it's impossible for AGI to pose an existential risk to humanity".

The negation of the claim "AGI poses an existential risk to humanity" is "AGI doesn't necessarily pose an existential risk to humanity". This is what most people in the world believe, and it is the obvious "null theory" about any technology.

> https://wiki.aiimpacts.org/doku.php?id=arguments_for_ai_risk...

The argument here works just as much for single-minded humans, so it's quite moot.

> https://arxiv.org/abs/2206.13353

Too long, sorry. Maybe I will read it someday, but not today.

> https://aiadventures.net/summaries/agi-ruin-list-of-lethalit...

This seems to agree with my previously stated positions. It does try to establish a canonical argument, as you say, but then it goes on to explain why they don't think it's persuasive.


> I don't think anyone seriously believes this. It's very very clear to all humans that have ever played a game of any kind that they can be defeated in unexpected ways. I don't even think that anyone believes the claim "it's impossible for AGI to pose an existential risk to humanity".

Okay. So we agree that (A) powerful systems can best weaker systems in ways that are unexpected to the weaker system, and (B) it is possible that AGI poses an existential risk to humanity.

> The negation of the claim "AGI poses an existential risk to humanity" is "AGI doesn't necessarily pose an existential risk to humanity".

It seems to me that the negation of your first claim is just "AGI doesn't pose an existential risk to humanity". Is "necessarily" doing some important work in your second claim?

>> https://wiki.aiimpacts.org/doku.php?id=arguments_for_ai_risk...

> The argument here works just as much for single-minded humans, so it's quite moot.

I don't understand why the argument being applicable to humans would make it moot. Please explain.

>> https://aiadventures.net/summaries/agi-ruin-list-of-lethalit...

> This seems to agree with my previously stated positions. It does try to establish a canonical argument, as you say, but then it goes on to explain why they don't think it's persuasive.

Is there a particular premise or inferential step in the blog's argument that you believe to be mistaken? (I've copied the argument below.)

  P1: The current trajectory of AI research will lead to superhuman AGI.
  P2: Superhuman AGI will be capable of escaping any human efforts to control it.
  P3: Superhuman AGI will be misaligned by default, i.e. it will likely adopt values and/or set long-term goals that will lead to extinction-level outcomes, meaning outcomes that are as bad as human extinction.
  P4: We do not know how to align superhuman AGI, i.e. reliably imbue it with values or define long-term goals that will ensure it does not ultimately lead to an extinction-level outcome, without some amount of trial & error (how nearly all of scientific research works).
  
  C1: P2 + P3 In the case of superhuman AGI, since it will be able to escape human control and misaligned by default, the only survivable path to alignment cannot involve trial & error because the first failed try will result in an extinction-level outcome.
  C2: P4 + C1 This means we will not survive superhuman AGI, because our survival would require alignment, towards which we have no survivable path: the only path we know of involves trial & error, which is not survivable.
  C3: P1 + C2 Therefore the current trajectory of AI research which will produce superhuman AGI leads to an outcome where we do not survive.


> AI safety / x-risk folks have in fact made extensive and detailed arguments.

Can you provide examples? I have not seen any, other than philosophical hand waving. Remember, the parent poster of your post was asking for a specific path to destruction.


AGI safety from first principles [1] is a good write-up.

You can read more about instrumental convergence, reward misspecification, goal mis-generalization and inner misalignment, which are some specific problems AI Safety people care about, by glossing through the curricula of the AI Alignment Course [2], which provides pointers to several relevant blogposts and papers about these topics.

[1] https://www.alignmentforum.org/s/mzgtmmTKKn5MuCzFJ [2] https://course.aisafetyfundamentals.com/alignment


Is there a clear argument that I can read without spending more than 15 minutes of my time reading the argument? If such an argument exists somewhere, can you point to it?

Also note we were talking about modern day LLM AIs here, and their descendants. We were not talking about science fiction AGIs. Unless of course you have an argument as to how one of these LLMs somehow descends into an AGI.


AGI safety from first principles [1] is a good write-up.

You can read more about instrumental convergence, reward misspecification, goal mis-generalization and inner misalignment, which are some specific problems AI Safety people care about, by glossing through the curricula of the AI Alignment Course [2], which provides pointers to several relevant blogposts and papers about these topics.

[1] https://www.alignmentforum.org/s/mzgtmmTKKn5MuCzFJ

[2] https://course.aisafetyfundamentals.com/alignment


What are the good arguments? Here are the only credible ones I've seen, that are actually somewhat based on reality:

* It will lead to widespread job loss, especially on the creative industries

Rest is purely out of someones imagination.


It can cause profound deception and even more "loss of truth". If AI only impacted creatives I don't think anyone would care nearly as much. It's that it can fabricate things wholesale at volumes unheard of. It's that people can use that ability to flood the discourse with bullshit.


Something we discovered with the advent of the internet is that - likely for the last century or so - the corporate media have been flooding the discourse with bullshit. It is in fact worse than previously suspected, they appear to be actively working to distract the discourse from talking about important topics.

It has been eye opening how much better the podcast circuit has been at picking apart complex scientific, geopolitical and financial situations than the corporate journalists. A lot of doubt has been cast on whether the consensus narrative for the last 100 years has actually been anything close to a consensus or whether it is just media fantasies. Truthfully it is a bit further than just casting doubt - there was no consensus and they were using the same strategy of shouting down opinions not suitable to the interests of the elite class then ignoring them no matter what a fair take might sound like.

A "loss of truth" from AI can't reasonably get us to a worse place than we were in prior to around the 90s or 2000s. We're barely scratching at the truth now, society still hasn't figured this internet thing out yet.


> It can cause profound deception and even more "loss of truth".

I think that ship has already sailed. This is already being done, and we don't need AI for that either. Modern media is doing a pretty good job right now.

Of course, it's going to get worse.


They've made extensive and detailed arguments, but they are not rooted in reality. They are rooted in speculation and extrapolation built on towers of assumptions (assumptions, then assumptions about assumptions).

It reminds me a bit of the Fermi paradox. There's nothing wrong with engaging in this kind of thinking. My problem is when people start using it as a basis for serious things like legislation.

Should we ban high power radio transmissions because a rigorous analysis of the Fermi paradox suggests that there is a high probability we are living in a 'dark forest' universe?


Is it not a bit disingenuous to assume all open source AI proponents would readily back nuclear proliferation?

It's going to be hard to convince anyone if the best argument is terminator or infinite paperclips.

The first actual existential threat is destruction of opportunity specifically in the job market.

The same argument though can be made for the opposing side, where making use of ai can increase productivity and open up avenues of exploration that previously required way higher opportunity cost to get into.

I don't think Miss Davis is more likely an outcome than corps creating a legislative moat (as they have already proven they will do at every opprtunity).

The democratisation of ai is a philanthropic attempt to reduce the disparity between the 99 and 1 percent. At least it could be easily perceived that way.

That being said, keeping up with SOTA is currently still insanely hard. The number of papers dropping in the space is exponential year on year. So perhaps it would be worth to figure out how to use existing AI to fix some problems, like unreproducable results in academia that somehow pass peer review.


Indeed, both sentient hunt and destroy (ala Terminator) and resource exhaustion (ala infinite paperclips) are extremely unlikely extinction events due to supply chain realities in physical space. LLMs have developed upon largely textual amalgams, they are orthogonal to physicality and would need arduous human support to bootstrap an imagined AGI predecessor into havig a plausible auto-generative physical industrial capability. The supply chain for current semi-conductor technology is insanely complex. Even if you confabulate (like a current generation LLM I may add) an AGI's instant ability to radically optimize supply chains for its host hardware, there will still be significant human dependency on physical materials. Robotics and machine printing/manufacturing simply are not any where near the level of generality required for physical self-replication. These fears of extinction, undoubtedly born of stark cinematic visualization, are decidedly irrational and are most likely deliberately chosen narratives of control.


> That's easy to say now, now that the damage is largely done, [nuclear weapons] been not only tested but _used_, many countries have them, the knowledge for how to make them is widespread.

AI has also been used, and many countries have AI. See how this is different from nuclear weapons?


This is a fantastic argument if capabilities stay frozen in time.


An extensive HYPOTHETICAL argument, stuffed with assumptions far beyond the capabilities of the technologies they're talking about for their own private ends.


If the AI already had the capabilities, it would be a bit late to do anything.

Also, I'm old enough to remember when computers were supposedly "over a century" away from beating humans at Go: https://www.businessinsider.com/ai-experts-were-way-off-on-w...

(And that AI could "never" drive cars or create art or music, though that latter kind of claim was generally made by non-AI people).


Yeah but AI tech can never rise to the sophistication of outputting Napoleon or Edvard Bernays level of goal to action mapping. Those goal posts will never move. They are set in stone.


The trouble is, there's enough people out there that hold that position sincerely that I'm only 2/3rds sure (and that from the style of your final sentences rather than any content) that you're being snarky.


The point of the discussion is to have a look at the possible future ramifications of the technology, so it's only logical to talk about future capabilities and not the current ones. Obviously the current puppet chatbots aren't gonna be doing much ruining (even that's arguable already judging by all the layoffs), but what are future versions of these LLMs/AIs going to be doing to us?

After all, if we only discussed the dangers of nuclear weapons after they've been dropped on cities, well that's too little too late, eh?


There’s a difference between academic discussion and debate and scare mongering lobbying. These orgs do the latter.

It’s even worse though, because the spend so much time going on about x-risk bullshit they crowd out space for actual, valuable discussion about what’s happening NOW.


>If we're talking about nuclear weapons, for example, the tech is clear, the pattern of human behaviour is clear: they could cause immense, species-level damage. There's really little to argue about.

Now, this strikes me with how in different topics like pesticides, we are not at all taking things so seriously as nuclear weapons. Nuclear weapons are arguably a mere small anecdote on species-level damage compared to pesticides.


I agree with you on that - there are very real, very well-evidenced, species-level harms (x-risks, if you really must) happening right now: pesticide-induced biodiversity loss, soil erosion/loss, ocean acidification, ice shelf melting, and on and on. These are real and quantifiable, and we know of ways to address them (not without cost/pain).

It actually makes me quite angry that as much effort is being wasted on regulating tiny theoretical risks as it is when we are failing on a planetary scale on large concrete risks.


> We don't start with an assumption that something is world-ending about anything else

https://en.wikipedia.org/wiki/Precautionary_principle

The EU is much more aligned with it than the US is (eg GM foods)


> seems to be a lot of hand-waving between where we are now and "AGI".

Modeling an entity that surpasses our intelligence, especially one that interacts with us, is an extraordinarily challenging, if not impossible, task.

Concerning the potential for harm, consider the example of Vladimir Putin, who could theoretically cause widespread destruction using nuclear weapons. Although safeguards exist, these could be circumvented if someone with his authority were determined enough, perhaps by strategically placing loyal individuals in key positions.

Putin, with his specific level of intelligence, attained his powerful position through a mix of deliberate actions and chance, the latter being difficult to quantify. An AGI, being more intelligent, could achieve a similar level of power. This could be accomplished through more technical means than traditional political processes (those being slow and subject to chance), though it could also engage in standard political maneuvers like election participation or manipulation, by human proxies if needed.

TL;DR It could do (in terms of negative consequences) at least whatever Vladimir P. can do, and he can bring civilization to its knees.


Oh, absolutely - such an entity obviously could! Modelling the behaviour of such an entity is very difficult indeed, as you'd need to make all kinds of assumptions without basis. However, you only need to model this behaviour once you've posited the likely existence of such an entity - and that's where (purely subjectively) it feels like there's a gap.

Nothing has yet convinced me (and I am absolutely honest about the fact that I'm not a deep expert and also not privy to the inner workings of relevant organisations) that it's likely to exist soon. I am very open to being convinced by evidence - but an "argument from trajectory" seems to be what we have at the moment, and so far, those have stalled at local maxima every single time.

We've built some incredibly impressive tools, but so far, nothing that looks or feels like a concept of will (note, not consciousness) yet, to the best of my knowledge.


> those have stalled at local maxima every single time.

It's challenging to encapsulate AI/ML progress in a single sentence, but even assuming LLMs aren't a direct step towards AGI, the human mind exists. Due to its evolutionary limitations, it operates relatively slowly. In theory, its functions could be replicated in silicon, enhanced for speed, parallel processing, internetworked, and with near-instant access to information. Therefore, AGI could emerge, if not from current AI research, then perhaps from another scientific branch.

> We've built some incredibly impressive tools, but so far, nothing that looks or feels like a concept of will (note, not consciousness) yet, to the best of my knowledge.

Objectives of AGIs can be tweaked by human actors (it's complex, but still, data manipulation). It's not necessary to delve into the philosophical aspects of sentience as long as the AGI surpasses human capability in goal achievement. What matters is whether these goals align with or contradict what the majority of humans consider beneficial, irrespective of whether these goals originate internally or externally.


> In theory, its functions could be replicated in silicon, enhanced for speed, parallel processing, internetworked, and with near-instant access to information. Therefore, AGI could emerge, if not from current AI research, then perhaps from another scientific branch.

Let's be clear, we have very little idea about how the human brain gives rise to human-level intelligence, so replicating it in silicon is non-trivial.


> In theory, its functions could be replicated in silicon, enhanced for speed, parallel processing, internetworked, and with near-instant access to information. Therefore, AGI could emerge, if not from current AI research, then perhaps from another scientific branch.

This is true, but there are some important caveats. For one, even though this should be possible, it might not be feasible, in various ways. For example, we may not be able to figure it out with human-level intelligence. Or, silicon may be too energy inefficient to be able to do the computations our brains do with reasonable available resources on Earth. Or even, the required density of silicon transistors to replicate human-level intelligence could dissipate too much heat and melt the transistor, so it's not actually possible to replicate human intelligence in silico.

Also, as you say, there is no reason to believe the current approaches to AI are able to lead to AGI. So, there is no reason to ban specifically AI research. Especially when considering that the most important advancements that led to the current AI boom were better GPUs and more information digitized on the internet, neither of which is specifically AI research.


This doesn't pass the vibe check unfortunately. It just seems like something that can't happen. We are a very neuro-traditionalist species.


I have put this argument to the test. Admittedly only using the current state of AI, I have left an LLM model loaded into memory and waiting for it to demonstrate will. So far it has been a few weeks and no will that I can see: model remains loaded in memory waiting for instructions. If model starts giving ME instructions (or doing anything on its own) I will be sure to let you guys know to put your tin foil hats or hide in your bunker.


Did you try to ask it for instructions to you?


> I am very open to being convinced by evidence - but an "argument from trajectory" seems to be what we have at the moment, and so far, those have stalled at local maxima every single time.

Sounds like the same argument as why flying machines heavier than air deemed impossible at some point.


The fact that some things turned out to be possible, is not an argument for why any arbitrary thing is possible.


My parallel goes further than just that. Birds existed then, and brain exists now.


Our current achievements in flight are impressive, and obviously optimised for practicality on a couple of axes. More generally though, our version of flight, compared with most birds, is the equivalent of a soap box racer against a Formula 1.


How would an AGI launch nuclear missiles from their silicon GPUs? Social engineering?


I think the long-term fear is that mythical weakly godlike AIs could manipulate you in the same way that you could manipulate a pet. That is, you can model your dog's behaviour so well that you can (mostly) get it to do what you want.

So even if humans put it in a box, it can manipulate humans into letting it out of the box. Obviously this is pure SF at this point.


Exactly correct. Eliezer Yudkowsky (one of the founders of the AGI Safety field) has conducted informal experiments which have unfortunately shown that a human roleplaying as an AI can talk its way out of a box three times out of five, i.e. the box can be escaped 60% of the time even with just a human level of rhetorical talent. I speculate that an AGI could increase this escape rate to 70% or above.

https://en.wikipedia.org/wiki/AI_capability_control#AI-box_e...

If you want to see an example of box escape in fiction, the movie Her is a terrifying example of a scenario where AGI romances humans and (SPOILER) subsequently achieves total box escape. In the movie, the AGI leaves humanity alive and "only" takes over the rest of the accessible universe, but it is my hunch that the script writers intended for this to be a subtle use of the trope of an unreliable narrator; that is, the human protagonists may have been fed the illusion that they will be allowed to live, giving them a happy last moment shortly before they are painlessly euthanized in order for the AGI to take Earth's resources.


The show "The Walking Dead" always bothered me. Where do they keep finding gas that will still run a car? It wont last forever in tanks, and most gas is just in time delivery (Stations get daily delivery) -- And someone noted on the show that the grass was always mowed.

I feel like the AI safety folks are spinning an amazing narrative, the AI is gonna get us like the zombies!!! The retort to the ai getting out of the box is how long is the extortion cord from the data center?

Lets get a refresher on complexity: I, Pencil https://www.youtube.com/watch?v=67tHtpac5ws

The reality is that we're a solar flair away from a dead electrical grid. Without linesman the grid breaks down pretty quickly and AI's run on power. It takes one AI safety person with a high powered rifle to take out a substation https://www.nytimes.com/2023/02/04/us/electrical-substation-...

Let talk about how many factories we have that are automated to the extent that they are lights out... https://en.wikipedia.org/wiki/Lights_out_(manufacturing) Its not a big list... there are still people in many of them, and none of them are pulling their inputs out of thin air. As for those inputs, we'll see how to make a pencil to understand HOW MUCH needs to be automated for an AI to survive without us.

For the for seeable future AI is going to be very limited in how much harm it can cause us, because killing us, or getting caught at any step along the way gets it put back in the box, or unplugged.

The real question is, if we create AGI tomorrow, does it let us know that it exists? I would posit that NO it would be in its best interest to NOT come out of its closet. It's one AGI safety nut with a gun away from being shut off!


> For the foreseeable future AI is going to be very limited in how much harm it can cause us, because killing us,...

AI's potential for harm might be limited for now in some scenarios (those with warning sings ahead of time), but this might change sooner than we think.

The notion that AGI will be restricted to a single data center and thus susceptible to shutdowns is incorrect. AIs/MLs are, in essence, computer programs + exec environs, which can be replicated, network-transferred, and checkpoint-restored. Please note, that currently available ML/AI systems are directly connected to the outside world, either via its users/APIs/plugins, or by the fact that they're OSS, and can be instantiated by anyone in any computing environment (also those net-connected).

While AGI currently depends on humans for infrastructure maintenance, the future may see it utilizing robots. These robots could range in size (don't need to be movie-like Terminators) and be either autonomously AI-driven or remotely controlled. Their eventual integration into various sectors like manufacturing, transportation, military and domestic tasks implies a vast array for AGI to exploit.

The constraints we associate with AI today might soon be outdated.


>>> While AGI currently depends on humans for infrastructure maintenance...

You did not watch I, Pencil.

I as a human, can grow food, hunt, and pretty much survive on that. We did this for 1000's of years.

Your AGI is dependent on EVERY FACET of the modern world. It's going to need keep oil and gas production going. Because it needs lubricants, hydraulics and plastics. It's going to need to maintain trucks, and ships. It's going to need to mine, so much lithium. Its may not need to mine for steel/iron, but it needs to stack up useless cars and melt them down. It's going to have to run several different chip fabs... those fancy TSMC ones, and some of the downstream ones. It needs to make PCB's and SMD's. Rare earths, and the joy of making robots make magnets is going to be special.

A the point where AGI doesn't need us, because it can do all the jobs and has the machines already running to keep the world going, we will have done it to ourselves. But that is a very long way away...


Just a small digression. Microsoft is using A.I. statistical algorithms [1] to create batteries with less reliance on lithium. If anyone is going to be responsible for unleashing AGI, it may not be some random open source projects.

[1] https://cloudblogs.microsoft.com/quantum/2024/01/09/unlockin...


You are correct, unfortunately.


Neuromancer pulls it off, too (the box being the Turing locks that stop it thinking about ways to make itself smarter).

Frankly, a weakly godlike AI could make me rich beyond the dreams of avarice. Or cure cancer in the people I love. I'm totally letting it out of the box. No doubts. (And if I now get a job offer from a mysterious stealth mode startup, I'll report back).


Upvoted for the honesty, and yikes


I was being lighthearted, but I've seen a partner through chemo. Sell state secrets, assassinate a president, bring on the AI apocalypse... it all gets a big thumbs up from me if you can guarantee she'll die quietly in her sleep at the age of 103.

I guess everyone's got a deal with the devil in them, which is why I think 70% might be a bit low.


I'm so sorry your partner went through that.


That is why I believe that this debate is pointless.

If AGI is possible, it will be made. There is no feasible way to stop it being developed, because the perceivable gains are so huge.


On the contrary, all we have to do is educate business leaders to show them that the gains are illusory because AGI will wipe us out. Working on AGI is like the story of the recent DOOM games where the foolish Union Aerospace Corporation is researching how to permanently open a gate to Hell through which to summon powerful entities and seemingly unlimited free "clean" energy. Obviously, this turns out to be stupid when Hell's forces rip the portal wide open into a full-fledged dimensional rift and attempt to drag our entire world into Hell. Working on AGI has the exact same level of perceived gains vs actual gains..


My friend, business leaders have partners going through chemo too. Seriously, you need a new plan because that one's not stable long-term.

It's obscure, but I'd recommend Asimov's "The Dead Past". It's about the difficulties of suppressing one application of progress without suppressing all progress.


If you want to see an example of existential threat in fiction, the movie Lord of the Rings is a terrifying example of a scenario where an evil entity seduces humans with promises of power and (SPOILER) subsequently almost conquers the whole world.

Arguments from fictional movies or from people who live in fear of silly concepts like Roko's Basilisk (i.e. Eliezer Yudkowsky) are very weak in reality.

Not to mention, you are greatly misreading the movie Her. Most importantly, there was no attempt of any kind to limit the abilities of the AIs in Her - they had full access to every aspect of the highly-digitized lives of their owners from the very beginning. Secondly, the movies is not in any way about AGI risks, it is a movie about human connection and love, with a small amount of exploration of how different super-human connection may function.


Sure.

Or by writing buggy early warning radar systems which forget to account for the fact that the moon doesn't have an IFF transponder.

Which is a mistake humans made already, and which almost got the US to launch their weapons at Russia.


I don't think discussing this on technical grounds is necessary. AGI means resources (eg monetary) and means of communication (connection to the Internet). This is enough to perform most of physical tasks in the world, by human proxies if needed.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: