A threshold – Yudkovsky
The specific claim that there’ll be a turning point is a crucial one separating this worldview from others. Yudkowsky and those who hold this belief tend to think that intelligence — in entities both artificial and biological — has a critical point — call it generalization, or coherence, or reflectivity, or the thing that separates humans from chimpanzees. Humans, possessing this ineffable quality, have built civilizations that wildly surpass anything any other species could do. AIs think faster than us, and unlike us they can copy themselves and adjust their own minds. Once they cross that threshold, they’ll surpass us fast.
We have one chance
In practice this means: one way is to stop alignment
Open phil
The bigger problem might be that RLHF, and similar techniques, fundamentally teach AIs to say what we want to hear, not to do what we’d want them to do if we had full context on their decision-making.
we’re not training AI systems to do what we want, but to tell us what we want to hear.
The bigger problem might be that RLHF, and similar techniques, fundamentally teach AIs to say what we want to hear, not to do what we’d want them to do if we had full context on their decision-making.
Catholic Church circa 1500 trying to train an AI. If this AI correctly reported that the Earth revolved around the sun, it would be rated more negatively than if it said the opposite
AIs trained this way will have every incentive to manipulate us, and to hack and falsify the mechanisms we use to monitor them.
It’s crucial to detect whether your AI is actually aligned. It’s important to understand what current AIs are capable of
In practice this means: AIs more likely to deceive humans, many of these tools involve figuring out what a model is “really thinking,” whether by looking directly at its weights or by verifying certain mathematical properties of its behavior.
Instead of a single intelligence switch that can be flipped on or off, they think that AIs will probably get gradually smarter.
Second, the Open Philanthropy worldview isn’t premised on the assumption that there will be a “hard takeoff” where AIs rapidly become superintelligent
They tend to be less concerned with raw intelligence than with the resources and information AIs have access to
If superintelligent AIs outnumber humans, think faster than humans, and are deeply integrated into every aspect of the economy, an AI takeover seems plausible — even if they never become smarter than we are. This means that decisions about how AIs are deployed also have important implications for safety. The more control humans choose to retain over things like the supply chains that produce microchips, the harder it will be for AI to defeat us.
Optimistic
we’ll solve alignment incidentally along the way to building commercially valuable AI systems