I realized that learning about AI safety is actually fascinating. You end up learning about the nature of reasoning, agency and intelligence. And it served as if physicist were tasked to understand these things from the first principles. For example…
There is AI predictor (strategizing) that uses actuators (actions in the world) and there is AI reporter (verifying) to report the truth about what happened. Despite the AI reporters goal, it is more likely it will default to not truth. There is only one truth and multitudes of not truths. The correct statement is a lot harder to make and therefore it is a challenge to incentivize the reporter to construct it.
It reminds me of this idea.
It's a lot harder to create something constructive than destructive. First was dynamite then was a combustion engine. First was the atomic bomb, then was a nuclear electricity plant. In order to make something constructive one need to make it safe, control many moving parts, sync a variety of processes together