Luthien

Project summary

Luthien is a non-profit developing AI Control for immediate real-world deployments, based on Redwood Research's AI Control agenda. https://luthienresearch.org/

What are this project's goals? How will you achieve them?

Luthien's ultimate goal is to increase the probability that effective AI Control systems will be deployed to mitigate catastrophic risks from frontier AI systems when they are developed.
We think getting real-world feedback ASAP on how AI Control systems perform in real-world situations and iterating aggressively to develop practical, effective Control systems significantly increases the odds that effective AI Control systems will be deployed to mitigate otherwise-catastrophic risks.
To that end we're developing AI Control systems targeting prosaic, lower-stakes scenarios, like occasionally-misbehaving coding assistants. By doing so, we hope to discover and solve the types of unforeseen problems that emerge when any new type of system is deployed, develop a playbook for effective AI Control deployments, establish standards and best practices for effective AI Control systems, and test and develop effective Control strategies in real-world situations. Additionally, we want to see how far we can push automated red/blue-teaming to develop effetive strategies quickly.
Luthien's secondary goal is to establish an AI Safety presence in Seattle, where there is currently a great deal of latent talent but very few opportunities to onboard into the AI Safety space.

How will this funding be used?

Funding will be used for salary, API credits, other compute infrastructure, and org logistics like attending ControlConf.
Our minimal funding goal ($500) buys us a slightly longer runway (salary, hiring contractors for part-time work, API credits, other compute infrastructure, and org logistics like attending ControlConf).
At current funding levels, Luthien is on the verge of being able to hire a second person. At $5000 I'll be reasonably confident that Luthien can hire a second person full-time (because (1) I'll take it as a signal that a relatively small time investment can net sufficient donations to more than pay for the lost time and (2) with two people there will be slightly more slack such that it should be possible to do more focused deep work without needing to devote a large fraction of org resources to just fundraising).
(Update May 2025) After running budget scenarios over the next year I've gotten significantly more conservative about how much liquidity Luthien needs to expand hiring. Right now my main priority is keeping the runway long enough to develop a useful thing and start iterating on feedback. I do think that an expanded team can help with this, but with ramp-up time it would take at least several months for that to pay off, and I don't want to make that trade until I'm confident that we have enough liquidity to make it to iterating-on-real-world-feedback while paying the full team. Those thresholds are approximately as follows:
$100k: Additional hire (software development focused)
$225k: Two additional hires (software development focused)
$350k: Three additional hires (software development focused)
$500k: Four additional hires (three software development focused, one project management/organization/meta/outreach focused)
~~Beyond this~~, Between (and beyond) these tiers, donations buy runway and therefore focused-deep-work-time-not-focused-on-fundraising.
These amounts will in practice fluctuate as development continues (we'll need less additional runway to get to the point of iterating on feedback)
~~Past $100k it potentially becomes possible to hire a third person.~~

Who is on your team? What's your track record on similar projects?

Luthien currently consists of me, Jai Dhyani. I'm an ex-ML Software Engineer at Meta, MATS alum, and a co-author on "RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts".

What are the most likely causes and outcomes if this project fails?

We're unable to develop Control solutions that deliver enough benefit to justify the requisite costs in money, latency, and complexity for real-world use cases.
Immediate real-world use cases are sufficiently different across all relevant dimensions such that there's little to no payoff for developing and deploying effective Control systems for high-stakes scenarios.

How much money have you raised in the last 12 months, and from where?

~$187k. $150k of this is a one-time seed grant from the AI Safety Tactical Opportunities Fund meant to get us off the ground and spur further donations. The remainder is from individuals donating ~$5k-$20k each.