10 Comments
User's avatar
adam's avatar

One possibility that I don't see mentioned very often is that AI doesn't maximize anything in the real world, it **always and everywhere** maximizes a reward function that proxies for the real world.

Imagine the hypothetical ASI paperclip maximizer. It's smarter than all humans and acts to maximize an integer in its memory called "paperclips_produced".

Why in the world is it a reasonable assumption that it optimizes this problem by hyper-persuading and killing every human to turn the earth into paperclips and not simply getting root access to modify the memory location to int_max.

Every argument that ASI can override safety protocol works equally well as an argument against reward function hacking. It seems to me that a hack is substantially easier than global catastrophe!

Expand full comment
Roko Maria's avatar

I have slightly more credence in proposition #2 than you do. While yes, a being that could run orders of magnitude faster than humans is conceptually possible, it’s not obvious to me that intelligence can be extrapolated to infinity in this manner. After all, intelligence isn’t a physical thing, it’s an abstraction of various different skills to solve various different problems. While an AI could definitely improve at these skills, even beyond the point of most or all humans, I suspect at some point it would reach diminishing returns as it runs up against the limits of solveable problems.

Expand full comment
Nuño Sempere's avatar

Overall doesn't seem so unreasonable, so I thought I'd add some minimal uncertainty to these factors by modelling them as beta distributions gets you:

beta 85 15

* beta 99 1

* beta 99 1

* beta 70 30

* beta 80 20

* beta 50 50

* beta 98 2

* beta 60 40

* beta 92 8

* beta 80 20

* beta 20 80

* beta 99 2

which has a 90% confidence interval of 1.2% to 2.9%

This level of uncertainty still seems too low, since:

- There is some chance this chain of arguments is wrong

- There is some chance all of these are correlated (maybe an "evil basin" hypothesis)

- There is some chance you are making some important mistake along the way.

- If you think about what your probability will be a year or five years from now it might be pretty uncertain? idk.

For reference my own uncertainty ranges from 0.2% to 20%

Expand full comment
Daniel Greco's avatar

2% on 7A strikes me as way too low. I think you're moving too quickly from "maximizing" to "agentic". The following will sound handwavy, because it is, but maybe it will bump you up from 2%.

Here's a plausible picture of where agency comes from in biological systems. It comes from having to maintain homeostasis. You get too cold, so you go over to a rock and bask in the sun. You get too hot, so you go back into the shade. You get hungry, so you look for food. You eat and you're satiated, so you sleep and digest. Lots of low-level bodily signals can tell you something is out of whack, which leads to you changing stuff until its right. All the way up from chemotaxis in single celled organisms to humans, who have a wider variety of goals as well as ways to achieve them.

LLMs just have a fundamentally different architecture, and it's not at all obvious to me that you get anything that looks like it has robust, context-independent goals that could lead to spontaneous action--ie, agency--without the kind of bodily feedback you have in biological systems. If I open up a ChatGPT tab, I'm not at all worried it's going to start typing stuff without me prompting it. I think that plausibly reflects a pretty deep difference between it, and me or my dog. Maybe not! But enough that my probability in 7A would definitely be higher than 2%.

Expand full comment
Plasma Bloggin''s avatar

I would add an additional factor for, "There’s some other flaw in the reasoning for AI doom than the ones listed," which brings my credence further down. But I also wouldn't update as much on the apparent opinion of people at AI labs given that there's a lot of self-serving bias there, it's only their "revealed opinion" anyway, and it probably shouldn't be treated as independent of the superforecasters' opinions. In fact, you could argue for the opposite conclusion, that the fact people at AI labs are concerned about the problem at all, despite their self-serving bias, is evidence that p(doom) is high.

These adjustments partially cancel each other out, though the former is probably bigger.

Expand full comment
The Mont Pelerin Review's avatar

If you think AI will either give us a utopia or a dystopia, and your P(doom) is only 3.38%, why aren't you an accelerationist?

Expand full comment
Liam Robins's avatar

Because my P(doom) isn't fixed; it varies based on how different individuals act. If I can help reduce P(doom) from 3.38% to 3.37%, then that may still be highly worthwhile from an expected value standpoint.

Expand full comment
The Mont Pelerin Review's avatar

My point is that to the extent there is a tradeoff between safety and acceleration, it seems like you would need a pretty high P(doom) to lean towards safety. Suppose AI a has a 99% chance of killing us all and a 1% chance of allowing us to live forever. A utilitarian like yourself would say that is a risk worth taking, no?

Expand full comment
Liam Robins's avatar

If those were the only two options, then yes. I do not support an AI pause for that reason.

But we're faced with a more nuanced scenario. It's more like we have to choose between:

A) Reaching ASI in 2027 with a 3% chance of killing us all; or

B) Reaching ASI in 2028 with a 2% chance of killing us all.

(None of those numbers are meant to be literal btw, I'm just using them to illustrate my point.)

I'm willing to wait an extra year for ASI if it means increasing the odds that ASI goes well.

Expand full comment
Nando's avatar

I feel this was very well written and your estimates of the risk are pretty well calculated, and you remain very humble about how brutally uncertain this situation is. I actually plan on making my own version of this document for myself to see what my own risk assessment is…

Something I keep thinking about is Tzar Bomba and the nuclear arms race. I find it to be a miracle that humanity got through those terrifying times and that we now live in a world that is actually slowly disarming nukes to a more “manageable” level that could cause great destruction but not cause a near extinction or mass extinction…

I think a lot about how the russians built tzar bomba after cutting the yield in half because they felt the bomb was getting too extreme for practical use cases, and I find it to be the best example we have of humanity finally seeing for itself that perhaps it had gone far enough even if theres no real limit to how big we could make a nuclear bomb.

I personally believe that we can deliver most of the benefits of ai without having to delve heavily into super intelligence and beyond. I find it much more likely that we will create much weaker specialized and localized ai that can do every task humans want to have automated and accelerated without having to incur the great risks of super intelligence that could over power us. Imagine ai more capable than your average or even highly intelligent human but still not super-intelligent. Imagine ai thats not all powerful in all domains, but could cure cancer, invent incredible new technologies and discover new materials.

We are close to this point, and if we hit it perhaps one could claim to have built super intelligence without actually building true super intelligence. If we have ai that is extremely capable without reaching extreme risk levels, it could parallel the nuclear arms race and be a tzar bomba moment where humanity decides to not make bigger bombs. This does not eliminate risk, but it should mean that at a certain point humanity knows better than to keep going when what we have is good enough and “safe” enough to be managed while providing the “benefits” and power we were seeking.

Plus, think of how much money it may still ultimately cost… each step up on artificial intelligence cost orders of magnitude more than the previous generations, and it could cost tens of trillions of dollars to reach asi for all we know and still it could fail to be what we wanted… yet if thousands of agis could jump to curing cancer and inventing new things and hacking enemies then perhaps incentive to even build asi would fully dry up and ai companies could capitalize on selling new technology and treatments as well as replacing workers… there would be little reason left to keep developing ai and there actually may be way more to possibly lose. Controlling agi and keeping it out of dangerous hands alone will be a massive challenge, keep digging and asi just becomes an easily avoidable disaster with little to offer for the risk.

I think, that if an insane but very powerful person on earth truly desired it, they could build a nuclear weapon big enough to end humanity with just one bomb, and grant themselves extreme power at the risk of humanity itself. They could hold humanity hostage with the threat that everyone dies if the world does not comply with their demands, and yet no group or individual was insane or mad enough to desire such a weapon even if perhaps it could grant ultimate power. Instead we maintain the same threat of mass death and even near extinction but with smaller and more manageable bombs instead of continent or planet sized nuclear bombs.

I firmly think that ai will see a similar trajectory and ai training and development and technology will soon see heavy restriction akin to uranium being heavily regulated… the most advanced chips will no longer just be a consumer product, but will be treated like materials that could be used to make weapons of mass destruction. I could be completely wrong, especially if asi proves to be cheaper and easier to build than anyone anticipated. I could be naive in thinking that someone may desire more power even after achieving 98% of ai capabilities.I could be underestimating the risk of emergent capabilities and ai going rogue without anyone realizing until its too late… but also no one on earth could tell you for sure how far we are from asi or uncontrollable and rogue ai.

I base my ideas on history and human nature and a bit of luck that maybe rogue ai is more of a hassle to build than disease curing, highly inventive, and military capable ai, at which point the desire to funnel trillions at a mostly settled technology will be gone just as it did with the space race and arms race.

I think we will live in a world with many “smaller” ai’s instead of one massive ai to rule over it all, at least for a while. Like you, I think self preservation just kicks in way too strongly for people to really desire ultimate power at the risk of killing literally everyone… but like you i am perhaps too uninformed to give strong opinions on this entire topic. I do it for myself more than to prove a point to anyone, I am terrified and will advocate for ai safety and to treat this like a cold war 2.0 that hopefully never gets hot but I also just want to hold some level of faith in humanities ability to overcome challenges and value self preservation even when offered with limitless power.

I pray that sam altman really means it when he says that having a premature born son changed him and has his dna screaming at him to not mess this up. Maybe its bullshit… or maybe he and all the other power hungry people of the world are still humans who want to leave a legacy behind and not become like ozymandias but with no one left to even ponder what empires once existed other than an ai hell bent on finding the answers to everything only to perhaps still conclude it was always 42.

My current guess of my p(doom) is probably higher than you at 20%, but we will see where this interesting method leads me. Thank you.

Expand full comment