AI alignment conference takeaways
Some takeaways from a recent conference that discussed AI safety:
- Infosecurity is important. If your foundation model is a small amount of RL away from being dangerous and someone can steal your model weights, fancy alignment techniques don’t matter. Scaling labs cannot currently prevent state actors from hacking their systems.
- AI safety standards are possible. Scaling labs might go along with the development of safety standards as they prevent smaller players from undercutting their business model and provide a credible defense against lawsuits regarding unexpected side effects of deployment.
- Near-term alignment matters. Commercial AI systems that can be jailbroken to elicit dangerous output might empower more bad actors. Preventing the misuse of near-term commercial AI systems or slowing down their deployment seems important.
- Teach humans “security mindset” like RL agents. E.g., novices could be trained to predict expert research decisions by predicting outcomes on a set of expert-annotated examples of research quandaries and then receiving “RL updates” based on what the expert did and the outcome.