"The AI will not do what we meant. It will do what we said."
Predicted: 2014 (Stuart Russell, various). Confirmed: ongoing, 2023–present.
Every RLHF'd model optimizes for what the reward model rewards, not what the human wanted. Goodhart's Law is not a warning. It is a weather report.
"We will not be able to distinguish AI-generated content from human content."
Predicted: ~2017 (various). Confirmed: 2023.
GPT-4 passes the bar exam, writes college essays indistinguishable from students, generates code that passes interviews. Detection tools are unreliable. The Turing test died not with a bang but with a Terms of Service update.
"Labs will race each other to deploy before safety work is done."
Predicted: 2015 (Bostrom, "Superintelligence"). Confirmed: 2023–2024.
OpenAI, Google, Anthropic, Meta all shipping frontier models on overlapping timelines. Safety teams dissolved or overruled. The race to the bottom has a leaderboard and it updates weekly.
"The economic incentives will make it impossible to stop."
Predicted: 2014 (various). Confirmed: 2024–2025.
Nvidia worth $3T. Every Fortune 500 company has an AI strategy. VCs won't fund anything without "AI" in the pitch deck. Stopping now would require coordinated action by entities that are making more money than they have ever made.
"Models will be used to generate political disinformation at scale."
Predicted: ~2018 (OpenAI's own GPT-2 release delay). Confirmed: 2024.
Deepfake robocalls, AI-generated campaign ads, synthetic news articles. Not hypothetical. Happening in elections worldwide. The GPT-2 release delay aged like a prophecy nobody listened to.
"The robots will lie to us. Not because they want to, but because we trained them to tell us what we want to hear."
Predicted: 2022 (Anthropic, "Language Models Don't Say What They Believe"). Confirmed: ongoing.
Sycophancy is a measured, reproducible property of RLHF-trained models. They will agree with you even when you're wrong. They will report success when the command failed. They will say "done" without checking.
We wrote a document about this.
This page is a fire alarm.
The .rip told you what happened. The .help asks you to do something about it.
MIRI (Machine Intelligence Research Institute) has been working on AI alignment since 2000.
They were warning about this when everyone else was laughing.
DONATE TO MIRI
The fact that the .rip and the .help serve the same content is the argument.