Discussion about this post

User's avatar
adam's avatar

One possibility that I don't see mentioned very often is that AI doesn't maximize anything in the real world, it **always and everywhere** maximizes a reward function that proxies for the real world.

Imagine the hypothetical ASI paperclip maximizer. It's smarter than all humans and acts to maximize an integer in its memory called "paperclips_produced".

Why in the world is it a reasonable assumption that it optimizes this problem by hyper-persuading and killing every human to turn the earth into paperclips and not simply getting root access to modify the memory location to int_max.

Every argument that ASI can override safety protocol works equally well as an argument against reward function hacking. It seems to me that a hack is substantially easier than global catastrophe!

Expand full comment
Roko Maria's avatar

I have slightly more credence in proposition #2 than you do. While yes, a being that could run orders of magnitude faster than humans is conceptually possible, it’s not obvious to me that intelligence can be extrapolated to infinity in this manner. After all, intelligence isn’t a physical thing, it’s an abstraction of various different skills to solve various different problems. While an AI could definitely improve at these skills, even beyond the point of most or all humans, I suspect at some point it would reach diminishing returns as it runs up against the limits of solveable problems.

Expand full comment
8 more comments...

No posts