Post available on lesswrong and submitted to alignment forum.
https://www.lesswrong.com/posts/JxhJfqfTJB9dkq72K/alignment-is-hard-an-uncomputable-alignment-problem-1
@alexhb61
Computational Complexity & AI Alignment Independent Researcher
https://github.com/Alexhb61$0 in pending offers
I was introduced to concerns about AGI alignment via Robert Miles work back in college around 2018, and It motivated me to take Lise Getoor's Algorithms and Ethics class at UC Santa Cruz.
Now, I'm an independent researcher whose working on AI Alignment among other things. My current approach to AI Alignment is to use computational complexity techniques on black boxes. I'm of the opinion that post construction aligning black box AI's is infeasible.
Alexander Bistagne
about 1 year ago
Post available on lesswrong and submitted to alignment forum.
https://www.lesswrong.com/posts/JxhJfqfTJB9dkq72K/alignment-is-hard-an-uncomputable-alignment-problem-1
Alexander Bistagne
about 1 year ago
Project is on github. https://github.com/Alexhb61/Alignment/blob/main/Draft_2.pdf
citations and submitting to Alignment forum tommorrow.
Alexander Bistagne
about 1 year ago
This project is nearly at its target, but hit a delay near the beginning of september as I needed to take up other work to pay bills. Hopefully, I will post the minimal paper soon.
Alexander Bistagne
over 1 year ago
Conditional on 6k being reached,
I have committed to submitting an edited draft to the alignment forum on August 23rd
Alexander Bistagne
over 1 year ago
Correction Co-RE is the class not Co-R. The set of problems reducable to the complement of the halting problen
Alexander Bistagne
over 1 year ago
Technical detail worth mentioning; Here is the main theorem of the 6K project:
Proving an immutable code agent with turing-complete architecure in a turing machine simulateable environment has nontrivial betrayal-sensitive alignment is CoR-Hard.
The paper would define nontrivial betrayal-sensitive alignment and some constructions on agents needed in the proof.
Alexander Bistagne
over 1 year ago
Thanks for the encouragement and donation.
The 40K max would be a much larger project than the 6K project which is what I summarized.
6K would cover editing
-Argument refuting testing anti-betrayal alignments in turing complete architecture
-Argument connecting testing alignment to training alignment in single agent architecture
40k would additionally cover developing and editing
-Arguments around anti-betrayal alignments in deterministic or randomized, P or PSPACE complete architecture
-Arguments around short term anti-betrayal alignments
-Arguments connecting do-no-harm alignments to short term antibetrayal alignments
-Arguments refuting general solutions to the stop button problem which transform the utility function in computable reals context
-Arguments around general solutions to the stop button problem with floating point utility functions
-Foundations for modelling mutable agents or subagents
For | Date | Type | Amount |
---|---|---|---|
Manifund Bank | about 1 year ago | withdraw | 70 |
Alignment Is Hard | about 1 year ago | project donation | +70 |
Manifund Bank | over 1 year ago | withdraw | 6000 |
Alignment Is Hard | over 1 year ago | project donation | +1200 |
Alignment Is Hard | over 1 year ago | project donation | +1000 |
Alignment Is Hard | over 1 year ago | project donation | +3800 |