If you prefer video to text, the Center for AI Safety's Introduction to ML Safety is a good place to start.
AI safety, which includes AI alignment, is a research field which focuses on avoiding catastrophic outcomes from uncontrollable AI. Significant developments towards general intelligence with modern machine learning (for example ChatGPT and Bard) indicate that we need to ensure that artificial intelligent systems work for the benefit of humanity and that we avoid large-scale risks.
The field of AI safety is also sometimes referred to as Artificial General Intelligence Safety or AI Existential Safety. In our field of focus, AI safety is about the safeguarding of humanity from uncontrollable AI scenarios. This can include global systemic risks like nuclear war and cyberwarfare from the use of artificial intelligence but also the dangers from self-improving AI systems with their emergent uncontrollable goals.
This is a short introduction. See more extensive explanations from 80,000 hours, Ngo, Chan & Mindermann et al. (2023) and Carlsmith (2022).
We expect to see at least an internet-scale transformation of society with corresponding economic growth expectations arising from AI. Types of transformative AI like GPT models are not always controllable by their creators, leading to situations like Bing Chat threatening its users. This can lead to both greatly positive changes, but with it comes many risks…
When we imagine the future capability of artificially intelligent systems, we see many systemic risk increases from embedding uncontrollable machine learning systems in our critical infrastructure. Examples include nuclear war, large-scale cybersecurity risks and financial systems collapse (such as the 2010 flash crash). Additionally, we find that these systems can also develop emergent goals that can lead to uncontrollable self-improvement and unintentionally malicious goal-seeking behaviour.
The work on reducing risks from AI safety can be split into two sub-fields:
Many of the leading machine learning companies such as OpenAI, DeepMind and Anthropic are focused on the safe deployment of their technologies. Read their perspectives on safety here: OpenAI, DeepMind, Anthropic.
Other research groups are focused purely on the safety of these future systems. These include Alignment Research Center, Redwood Research, Stanford Existential Risk Initiative, Center for Human-Compatible AI, Apart Research, Ought, Aligned AI, Krueger Lab and Center for AI Safety (see more).
There is amazing positive potential in artificial intelligence and if we can answer the technical and societal questions, we have a unique opportunity to change the world for the better, with democratic, equitable and collaborative use of the technology.
We recommend that you dive into the introductory texts on AI safety linked above and join courses such as the AGI Safety Fundamentals and the Introduction to ML Safety course. You can also find more books and articles on Reading What We Can.
If you are actively working on AI safety in Europe, join our network. If you have any questions that remain unanswered, please reach out to us at contact@enais.co.
The ENAIS team can be found on the About page and is composed of a decentralized group of European researchers and organizers.