Alexander Heckett

About Me

Hi! I'm Xander.

My primary goal is to prevent existential risk from AI systems. I think that AI systems have the potential to reason faster and more competently than any human being and I think that this dynamic is conducive to problems for humanity. With that said, I have taken an increasing apathy to the actual mechanics of how neural networks behave because I feel that architectures are always liable to change and I doubt that any result or technique based on those mechanics is truly fundamental. Instead I feel drawn to studying the incentives and desires of AI systems at a more abstracted, game-theoretic level. I am particularly drawn to the intersection of scalable oversight research, Guaranteed Safe AI research (well, something like it), and AI control research. I have been very fortunate to work under Vincent Conitzer on research in this direction. I have also been involved in AI safety field-building, having been one of the co-founders of Carnegie Mellon's AI Safety Initiative student club and having helped out for two years.

Within mathematics, I am most drawn to analysis. I have greatly enjoyed the graduate classes in analysis at Carnegie Mellon and I wish I had the luxury of immersing myself more deeply in the subject. I would also like to give a special shoutout to the friend groups of undergraduates I have met in these classes -- I feel very fortunate to have struck upon such an incubating chamber of curiosity and support. I will also add that in high school I worked through a fair amount of graduate quantum mechanics material, and even if I've long since forgotten most of it, I think I will always be drawn to the aesthetic of theoretical physics research.

Finally, I do believe that theoretical insights can only go so far without some kind of real-world deployment. I've taken several semesters off from college to pursue software development projects stemming from my research interests and I'm always open to a good opportunity to implement something interesting!

Interests

These are the usual suspects for keeping me up at night:

AI Safety via Debate

How can we get AI's with differing interests to keep each other in check? How can we design games and reporting systems such that a much less capable overseer can be confident that more capable agents aren't getting up to anything undesired? I'm particularly interested in how debate protocols can benefit from allowing debaters to run computer programs to support their points, how debates can recurse into the assumptions that go into these programs, how simulations can be used to address those sub-debates, and so on.

AI for Mathematics

From a scalable oversight perspective, mathematics feels like an incredibly easy problem to solve. Languages like Lean can be used to determine efficiently whether a proof is correct, so the training signal for machine learning systems should be perfect. Reinforcement learning has allowed AI systems to outperform humans in many artificial games, but in practice AI agents haven't replaced mathematicians yet. Why not? I also feel less guilty working on improving AI mathematicians than on other domains because I feel that theorem proving is a comparatively safe application of AI systems.

Probabilistic Logic

I like to reason about AI Safety via Debate by assuming that there's some underlying network of arguments and facts and that debates traverse this network according to a path jointly controlled by the debaters. That perspective raises a natural question: what does this network actually look like? One might consider a Bayesian network, some kind of maximum entropy perspective, a more binary perspective with logical constraints that must be satisfied, etcetera. I've never been entirely satisfied by the canonical answers from argumentation theory, I wonder if there's more to be said.