Project summary
The ML Alignment & Theory Scholars (MATS) Program is an educational seminar and independent research program that aims to provide talented scholars with talks, workshops, and research mentorship in the field of AI alignment and connect them with the Berkeley AI safety research community.
MATS helps expand the talent pipeline for AI safety research by empowering scholars to work on AI safety at existing research teams, found new research teams, and pursue independent research. To this end, MATS connects scholars with research mentorship and funding, and provides a seminar program, office space, housing, research coaching, networking opportunities, community support, and logistical support to scholars. MATS supports mentors with logistics, advertising, applicant selection, and complementary scholar support systems, greatly reducing the barriers to research mentorship.
What are this project's goals and how will you achieve them?
Find + accelerate high-impact research scholars:
Pair scholars with research mentors via specialized mentor-generated selection questions;
Provide a thriving academic community for research collaboration, peer feedback, and social networking;
Develop scholars according to the “T-model of research” (breadth/depth/taste)
Offer opt-in curriculum elements, including seminars, research strategy workshops, 1-1 research coaching, peer study groups, and networking events.
Support high-impact research mentors:
Scholars are often good research assistants and future hires;
Scholars can offer substantive new critiques of alignment proposals;
Our operations and community free up valuable mentor time and increase scholar output.
Help parallelize high-impact AI alignment research:
Find, develop, and refer scholars with strong research ability, value alignment, and epistemics;
Use alumni for peer mentoring in later cohorts;
Update mentor list and curriculum as the needs of the field change.
How will this funding be used?
We are seeking general support funds for MATS, including future MATS cohorts, scholar stipends, scholar housing and accommodation, office rental, food, computing, events, travel, contractor labor, and payroll.
The most recent MATS cohort hosted 60 scholars and 15 mentors. It costs approximately $35k to fund one scholar through the entirety of the program (not including staff time spent on program elements).
Who is on your team and what's your track record on similar projects?
We are a small team of eight full-time staff members. An organization chart can be found here.
MATS has successfully run five cohorts over the last two years, including two extension programs. We have successfully scaled our twice-yearly program from 30 scholars and 5 mentors to 60 scholars and 15 mentors. Past mentors listed here.
Recent success stories:
Alumni have been hired by leading organizations like Anthropic, OpenAI, Google DeepMind, MIRI, ARC, Conjecture, the UK Frontier AI Taskforce, and the US government, and joined academic research groups like UC Berkeley CHAI, NYU ARG, and MIT Tegmark Group.
Alumni have founded AI safety organizations, including Apollo Research, Athena, Cadenza Labs, the Center for AI Policy, Leap Labs, Timaeus, and Stake Out AI.
Alumni have pursued independent research with funding from the Long-Term Future Fund, Open Philanthropy, Lightspeed Grants, Manifund, and the Foresight Institute.
Recent papers featuring scholars' research include “Copy Suppression: Comprehensively Understanding an Attention Head”, “How to Catch an AI Liar: Detection in Black-box LLMs by Asking Unrelated Questions”, "Sparse Autoencoders Find Highly Interpretable Features in Language Models", "Language Models Represent Space and Time", "Evaluating Language-Model Agents on Realistic Autonomous Tasks", "Representation Engineering: A Top-Down Approach to AI Transparency", and "Taken out of context: On measuring situational awareness in LLMs".
Scholars have helped develop new AI alignment agendas, including activation engineering, externalized reasoning oversight, conditioning predictive models, developmental interpretability, defining situational awareness, formalizing natural abstractions, and neurotech for alignment.
Recently, mentor Jeffrey Ladish and his MATS scholars fine-tuned Llama 2-Chat to reverse safety training for under $200. This case was brought before the US Senate (min 42:30) to demonstrate how easily and cheaply bad actors might remove safeguards from language models.
What are the most likely causes and outcomes if this project fails? (premortem)
Poor selection of scholars would waste mentor time + grant money. How we address this:
We defer to best-in-class research mentors for their applicant selection needs rather than attempting to select applicants ourselves and pairing them with mentors after the fact.
We use difficult, mentor-specific applicant selection questions so there is sufficiently high granularity distinguishing between top scholars.
We choose mentors with a proven track record of high-quality research or strong endorsements from the alignment research community.
Some mentors are less engaged or experienced than others, and the mentorship experience might be less helpful for their scholars. How we address this:
We soft cap the number of scholars in streams with low mentor engagement unless the applicants seem particularly self-directed.
Scholar support staff supplement the mentorship experience by providing 1-1 research strategy and unblocking support for scholars.
We hold workshops on research strategy, technical writing, research tools, and more to benefit scholars and offload time mentors might otherwise spend teaching these skills.
There might not be enough jobs or funding for all alumni to receive financial support for their research efforts (sometimes known as the “mass movement building” concern). How we address this:
Some of our alumni’s projects are attracting funding and hiring further researchers. Our alumni have started alignment teams/organizations that absorb further talent (listed above).
With the elevated interest in AI and alignment, we expect more organizations and funders to enter the ecosystem. We believe it is important to install competent, aligned safety researchers at new organizations early, and our program is positioned to help capture and upskill interested talent.
Sometimes, it is hard to distinguish truly promising researchers in two months, hence our four-month extension program. We likely provide more benefits through accelerating researchers than can be seen in the immediate hiring of alumni.
Alumni who return to academia or industry are still a success for the program if they do more alignment-relevant work or acquire skills for later hiring into alignment roles.
Scholars might overly defer to their mentors and fail to critically analyze important assumptions, decreasing the average epistemic integrity of the field. How we address this:
Our scholars are encouraged to “own” their research project and not unnecessarily defer to their mentor or other “experts.” Scholars have far more contact with their peers than mentors, which encourages an atmosphere of examining assumptions and absorbing diverse models from divergent streams. Several scholars have switched mentors during the program when their research interests diverged.
We require scholars to submit “Scholar Research Plans” one month into the in-person phase of the program, detailing a threat model they are targeting, their research project’s theory of change, and a concrete plan of action (including planned outputs and deadlines).
We encourage an atmosphere of friendly disagreement, curiosity, and academic rigor at our office, seminars, workshops, and networking events. Including a diverse portfolio of alignment agendas and researchers from a variety of backgrounds allows for earnest disagreement among the cohort. We tell scholars to “download but don’t defer” in regard to their mentors’ models. Our seminar program includes diverse and opposing alignment viewpoints. While scholars generally share an office room with their research stream, we split streams up for accommodation and discussion groups to encourage intermingling between streams.
What other funding are you or your project getting?
We generally seek funding from multiple sources, including Open Philanthropy, and Survival and Flourishing Fund. We recently received a grant from Open Philanthropy supporting MATS runway costs for 12 months. This does not include the cost of running another cohort.