Agency and (Dis)Empowerment

Technical AI safety

🌶

Damiano Fornasiere and Pietro Greiner

ActiveGrant

$60,250raised

$60,000funding goal

Fully funded and not currently accepting donations.

Project summary

6 months support for two people, Damiano and Pietro, to write a paper about (dis)empowerment with Jon Richens, Reuben Adams, Tom Everitt, and Victoria Krakovna. This application is suggested by Evan Hubinger.

Project goals

Produce a paper about agency and (dis)empowerment in the context of multi agent causal models.

Its ultimate aim is to offer formal and operational notions of (dis)empowerment. For example, an intermediate step would be to provide a continuous formalisation of agency, and to investigate which conditions increase or decrease agency.

These definitions should be:

Conceptually satisfactory: so as to deconfuse ambiguity on the concept;
Operationally implementable: both to be able to identify agents from causal models, and to train models (both RL and LLMs) in such a way that they are not disempowering, e.g., they do not reduce the agency of other agents (including us).

We anticipate a thorough study of these forms of (dis)empowerment, including a literature review, motivating examples and threat models, mathematical formalisations, sound and complete criteria to characterise (dis)empowerment, and the study of concrete examples.

How will this funding be used?

- 30k funding support for 6 months for 2 people;

- 10k expenses for 3 or 4 trips to London: rent, flights + transport, food, …;

- 20k taxation.

What is the recipient's track record on similar projects?

Damiano is a PhD student in Math. Pietro is a M.Sc. student in Math.

A first draft of the project, with preliminary results, has been delivered by Victoria Krakovna to Jon Richens and Tom Everitt as part of the selection for Damiano’s research phase of MATS. Jon and Tom agreed to co-supervise the project, and produce a paper out of it. Pietro helped significantly with the deliverable, proofreading it and sharing/discussing ideas. Consequently he joined the project.

For the preliminary results, the scope of the project has been reduced to offer two counterfactual definitions of disempowerment, providing sound and complete criteria to detect them in causal models, and studying how they relate to each other.

When we say "A is (dis)empowered by a policy π_B of B", we understand that both A and B can be agents, groups of agents, or (parts of) the environment itself (so π_B could be a tuple of policies):

Disempowerment as loss of decisional power: If B would choose a different policy, A could make a different decision than the best response to π_B, which would yield A greater utility;
Disempowerment as the need to cause certain outcomes: A has an incentive to control a set of variables X, A can control the variables X, but A's best response to π_B is constrained w.r.t. X, in the sense that if the outcome of X would be guaranteed regardless of A's actions, then A could pursue a different policy that would be at least as good.

How could this project be actively harmful?

Any mathematical formalisation of a notion that is to be avoided can be used to train models that optimise for that notion. As such, this project also suffers from this problem.

What other funding is this person or project getting?

Pietro is a master student getting no other funding. Damiano gets a salary as a PhD student in Barcelona.

🌶

Damiano Fornasiere and Pietro Greiner

almost 2 years ago

Update.

Thanks to the funding, we spent three periods in London working with Tom Everitt, Jonathan Richens and Victoria Krakovna at Google DeepMind. We also joined the larger Causal Incentives Working Group and interacted with the AI safety community in London. The project continued to gain traction through the extention of MATS and the involvement of interested people. We also visited each other twice to work together.

As anticipated, we conducted a literature review on agency and empowerment, interacting with academics both in information theory and AI and in social sciences such as community psychology. We proposed and analysed different formalisations, relating the concept to safety-critical concepts such as harm, benefit, goal-directedness and control. We are currently working on a submission to a high-profile ML conference.

donated $60,000

Evan Hubinger

over 2 years ago

Main points in favor of this grant

Generally, my policy for funding independent research is that I look for the presence of a mentor with a solid research track record that will be overseeing the research. I think completely independent research is rarely a good idea for junior researchers, but if a more senior researcher is involved to guide the project and provide feedback, then I think it tends to go quite well. In this case, there will be a number of senior researchers such as Tom Everitt and Victoria Krakovna overseeing the project, which makes me feel quite good about it.

Donor's main reservations

I have some reservations about the utility of mathematical formalizations of agency, as I think it's somewhat unclear how useful such a formalization actually would be or what we would do with it. That being said, I don't see much downside risk, and I certainly think there are some cases where it could be quite useful, such as for constructing good evaluations for agency in models.

Process for deciding amount

I am recommending the amount that Damiano requested as I think it is a reasonable amount given his breakdown.

Conflicts of interest

None.