Thanks @Austin!
@briantan
$0 in pending offers
Brian Tan
8 months ago
We at WhiteBox Research have been progressing well since we got our initial regrant from Renan Araujo, and we’d like to share some updates on our progress below! (We will have a strategy session this week, and we’ll share another update within the next two weeks about our next steps and how others can help us.)
Here are our key achievements since September 2023 (up to March 19, 2024):
In November, we were approved $61,460 in funding from the Long-Term Future Fund! Together with our funding from Manifund, this funds us until around August 2024.
We finalized more details of our program and named it the WhiteBox AI Interpretability Fellowship. It’s a five-month training and research program in Manila to master the fundamentals of mechanistic interpretability. We created this primer for the fellowship, and our training phase’s curriculum overview can be found here. [1]
We got 53 applicants and accepted 13 participants into our fellowship, surpassing our goal of getting 50 applicants and 12 participants. [2] [3] Our marketing and application process also helped us start building a wider community of people interested in AI interpretability. [4]
We onboarded Kyle Reynoso as a part-time teaching assistant in February, and he has contributed significantly since then. [5]
We ironed out a process for how participants can view, submit, and receive feedback on their exercise answers more seamlessly via GitHub Classroom and nbgrader.
We’re in the fourth week of our fellowship’s two-month training phase. So far, we’ve received generally positive feedback on the three Saturday sessions and the two-night retreat we held for participants, and we’ve maintained a consistent weekly tempo of adjustments and improvements to various aspects of the program.
Here are some footnotes to expound on our progress above:
[1] Since we opted for less experienced but high-potential participants (their average age is 21), we would probably have to cover more of the prerequisites than other programs (e.g., ARENA), which means we may only delve more into interpretability in the research phase of our program in June.
[2] We opted for a three-stage application process. Stage 1 involved solving Bongard problems and answering essay questions about alignment material (namely Ajeya Cotra’s Saint/Schemer/Sycophant post and Lee Sharkey’s Circumventing Interpretability post). Stage 2 tested their ability to solve coding problems that are tricky to solve even with GPT-4, and Stage 3 consisted of an unstructured interview largely based on the format of the insightful podcast Conversations with Tyler (Tyler Cowen).
[3] Some of the 13 people we accepted include an IOI silver medalist, a 19-year-old who recently got seed funding for his B2B startup, a fluent Lojban speaker who did contract work for an OpenPhil AI safety grantee, and a Master's student who won a gold medal in Taiwan for chemistry research she did in high school.
[4] We spent around a month marketing and community building to attract people to apply for current and/or future cohorts of our fellowship. We ran a successful virtual “Career Planning in the Age of AI” salon at the start of the year with around 27 attendees, and four people whom we ended up accepting joined it. We also started a community Discord server where people from the ambient community can interact with and discuss all sorts of questions with our participants, as a preliminary step towards wider community building in Southeast Asia. (We sent an invite to our server to all applicants, including those we rejected, some of whom already have more background in ML.)
[5] Our TA, Kyle Reynoso, graduated with the highest honors as a CS major in the top university in the country, was an officer in a local student EA chapter, and has a day job as an ML engineer for a marketing compliance AI startup in Australia.
Brian Tan
about 1 year ago
Sorry for the late reply here, but thanks Austin! (Oh and to clarify, Clark set up the prediction market, not Kriz!)
Brian Tan
about 1 year ago
Hi Renan, we really appreciate your decision to give us a regrant! Thanks also for sharing your thoughts about our project. We're taking your challenges/concerns into account, and we're already coming up with concrete plans to mitigate them.
Brian Tan
about 1 year ago
Hi Gaurav, thanks for weighing in on our project! Here are our thoughts on what you said, written mainly by Clark:
We agree there’s value in visiting Berkeley if people had the means, but we think it’s important there be more alignment hubs in various regions. We think that a good number of potential AIS researchers in Southeast Asia would find it costly and/or hard to visit or move to Berkeley (especially in the current funding landscape), as compared to visiting or working in Manila / SE Asia.
On research sprints to solve COPs: there are nuances to speed. Optimising for paper writing speed for example doesn't make sense, nor would treating the problems as Leetcode puzzles you can grind. The kind of speed we're optimizing for is closer to rate of exploration: how can we reduce our key uncertainties in a topic as quickly as possible? Can we discover all the mistakes and dead-ends ASAP to crystallize the topic's boundaries rapidly? Can we factor the open question into two dozen subquestions, each clearly doable in one sitting, and if so, how many of them can we do in a given timeframe? The crucial point is this: moving around produces information. We want to ruminate on questions in the middle of coding them up, develop the habit of thinking through problems in the space of a Jupyter notebook, and shrink this loop until it becomes second-nature. We have also emailed Neel Nanda and Joseph Bloom about our project and aim to get their advice, so we won't veer too far off course while still learning to walk on our own.
On mentorship, we expect to do well enough in the training phase, but we likely need more mentorship in the research phase. That's why we're going to get a research adviser. During the research phase, the students will (mostly) get advice from Clark and Kriz, while we take advice from a research adviser. The goal is eventually to train ourselves and/or get enough people on our team so that we can confidently do the advising ourselves. This is also why we're adopting the flipped classroom model: we'll only have to produce/curate the learning materials once, and then just focus on getting them to do exercises. We're quite confident this is doable as Clark has taught classes of more than 40 people before.
Let us know if you have more thoughts or questions!
For | Date | Type | Amount |
---|---|---|---|
Manifund Bank | about 1 year ago | withdraw | 4696 |
Manifund Bank | about 1 year ago | withdraw | 3028 |
Manifund Bank | about 1 year ago | withdraw | 4696 |
WhiteBox Research: Training Exclusively for Mechanistic Interpretability | about 1 year ago | project donation | +100 |
WhiteBox Research: Training Exclusively for Mechanistic Interpretability | about 1 year ago | project donation | +150 |
WhiteBox Research: Training Exclusively for Mechanistic Interpretability | about 1 year ago | project donation | +70 |
WhiteBox Research: Training Exclusively for Mechanistic Interpretability | about 1 year ago | project donation | +100 |
WhiteBox Research: Training Exclusively for Mechanistic Interpretability | about 1 year ago | project donation | +12000 |