Aryaman Reddi

🚀

Aryaman Reddi ආර්යමන් රෙඩ්ඩි

PhD Student

Technical University of Darmstadt

About Me

I am a PhD student in reinforcement learning at the LiteRL group at the Technical University of Darmstadt in partnership with the Intelligent Autonomous Systems lab and Hessian.AI, supervised by Professor Carlo D’Eramo 🎓

I am interested in developing sample-efficient techniques in deep multi-agent reinforcement learning using insights from game theory 🕹️

I am a continuum hypothesis skeptic, a mereological universalist, and a Collatz conjecture supporter.

Interests

Reinforcement Learning
Game Theory
Ethics and Philosophy

Education

PhD Computer Science
Technical University of Darmstadt, Germany
MEng & BA Information and Computer Engineering
University of Cambridge, United Kingdom

Experience

Machine Learning Research Engineer
Arm, Cambridge, United Kingdom June 2021 – June 2022
- Developed an open-source tool (ML Inference Advisor) to optimise neural networks for inference on Arm NPUs using Python (PyTorch, Numpy, Jupyter, Pandas), C++, Kubernetes & Docker.
- Improved processing efficiency for floating point operations in a class-A Arm NPU bridge by 14%
- Used machine learning clustering, kernel regression, and principal component analysis to improve verification coverage in an Arm CPU bridge by 11%

Education

PhD Computer Science
Technical University of Darmstadt, Germany August 2022 – April 2026
My focus is on developing sample-efficient algorithms for exploration, coordination, and communication in multi-agent reinforcement learning using insights from game theory.
I believe bridging the gap between practical deep learning and theoretical models of stochastic optimisation is essential for scaling RL in real-world MARL settings.
I build algorithms which exhibit high performance in high-dimensional environments while providing mathematical insights using probability theory, linear algebra, calculus, & functional analysis.
MEng & BA Information and Computer Engineering
University of Cambridge, United Kingdom October 2017 – June 2021
- Grade: Distinction (GPA 4.0 Equivalent)
- Received the David Thompson prize for academic achievement
Read Thesis

Papers

Deep Learning Agents Trained For Avoidance Behave Like Hawks And Doves

Multi Agent Reinforcement Learning

Deep Learning Agents Trained For Avoidance Behave Like Hawks And Doves

We present heuristically optimal strategies expressed by deep learning agents playing a simple avoidance game. We analyse the learning and behaviour of two agents within a symmetrical grid world that must cross paths to reach a target destination without crashing into each other or straying off of the grid world in the wrong direction. The agent policy is determined by one neural network that is employed in both agents. Our findings indicate that the fully trained network exhibits behaviour similar to that of the game Hawks and Doves, in that one agent employs an aggressive strategy to reach the target while the other learns how to avoid the aggressive agent.

Mar 14, 2025

Dynamic Obstacle Avoidance with Bounded Rationality Adversarial Reinforcement Learning

Dynamic Obstacle Avoidance with Bounded Rationality Adversarial Reinforcement Learning

Reinforcement Learning (RL) has proven largely effective in obtaining stable locomotion gaits for legged robots. However, designing control algorithms which can robustly navigate unseen environments with obstacles remains an ongoing problem within quadruped locomotion. To tackle this, it is convenient to solve navigation tasks by means of a hierarchical approach with a low-level locomotion policy and a high-level navigation policy. Crucially, the high-level policy needs to be robust to dynamic obstacles along the path of the agent. In this work, we propose a novel way to endow navigation policies with robustness by a training process that models obstacles as adversarial agents, following the adversarial RL paradigm. Importantly, to improve the reliability of the training process, we bound the rationality of the adversarial agent resorting to quantal response equilibria, and place a curriculum over its rationality. We called this method Hierarchical policies via Quantal response Adversarial Reinforcement Learning (Hi-QARL). We demonstrate the robustness of our method by benchmarking it in unseen randomized mazes with multiple obstacles. To prove its applicability in real scenarios, our method is applied on a Unitree GO1 robot in simulation.

Nov 6, 2024

Robust Adversarial Reinforcement Learning via Bounded Rationality Curricula

Adversarial Reinforcement Learning

Robust Adversarial Reinforcement Learning via Bounded Rationality Curricula

Robustness against adversarial attacks and distribution shifts is a long-standing goal of Reinforcement Learning (RL). To this end, Robust Adversarial Reinforcement Learning (RARL) trains a protagonist against destabilizing forces exercised by an adversary in a competitive zero-sum Markov game, whose optimal solution, i.e., rational strategy, corresponds to a Nash equilibrium. However, finding Nash equilibria requires facing complex saddle point optimization problems, which can be prohibitive to solve, especially for high-dimensional control. In this paper, we propose a novel approach for adversarial RL based on entropy regularization to ease the complexity of the saddle point optimization problem. We show that the solution of this entropy-regularized problem corresponds to a Quantal Response Equilibrium (QRE), a generalization of Nash equilibria that accounts for bounded rationality, i.e., agents sometimes play random actions instead of optimal ones. Crucially, the connection between the entropy-regularized objective and QRE enables free modulation of the rationality of the agents by simply tuning the temperature coefficient. We leverage this insight to propose our novel algorithm, Quantal Adversarial RL (QARL), which gradually increases the rationality of the adversary in a curriculum fashion until it is fully rational, easing the complexity of the optimization problem while retaining robustness. We provide extensive evidence of QARL outperforming RARL and recent baselines across several MuJoCo locomotion and navigation problems in overall performance and robustness.

May 6, 2024

Talks

How I Got into the University of Cambridge 🏛️

My chat with Learning While Travelling about the experience of applying for and studying engineering at the University of Cambridge

Aryaman Reddi

• Jan 2, 2025

Interview with Hessian.AI: How adversarial reinforcement learning trains robust AI 🟣

Interview with Hessian.AI: How adversarial reinforcement learning trains robust AI 🟣

I talked to Hessian.AI (one of the premier AI research institutions in Germany) about adversarial robustness and bridging the gap between practice and theory.

Aryaman Reddi

• Jun 10, 2024

What is Intelligence and How Do We Generate It? 🧠

I gave a talk to the Birmingham Schools Science Network about the basics of AI, how neural networks work, and how to enter a career in machine learning research.

Aryaman Reddi

• Oct 30, 2023

In Conversation with Lionel Shriver 📖

During my Master's, I was secretary of Cambridge University's first free speech society. Here's a talk we had on free speech in literature we had with acclaimed author Lionel Shriver.

Aryaman Reddi

• May 9, 2020

Blog

International Conference on Learning Representations (ICLR) 2024 📜

International Conference on Learning Representations (ICLR) 2024 📜

I presented my first spotlight paper at ICLR!

Jun 10, 2024

Symposium on Lifelong Explainable Robot Learning (SYMPLER) 2023 🦾

Symposium on Lifelong Explainable Robot Learning (SYMPLER) 2023 🦾

Our lab (LiteRL) and our sister lab (Pearl) co-hosted the Symposium on Lifelong and Explainable Robot Learning in Nürnberg.

Dec 5, 2023

A Tier List of the Jurors from 12 Angry Men (1957) ⚖️

A Tier List of the Jurors from 12 Angry Men (1957) ⚖️

Sidney Lumet’s 1957 feature film ‘12 Angry Men’, adapted from Reginald Rose’s teleplay of the same name, is a contained courtroom thriller that explores justice, reason, and civic duty.

Aug 28, 2022

How to use Text-to-Speech in WSL to inform you when a job has finished 💻

How to use Text-to-Speech in WSL to inform you when a job has finished 💻

A tutorial on how to use an embedded Windows TTS package to provide audio cues in WSL (Windows Subsystem for Linux).

Mar 24, 2022

Projects

Animating the Collatz Conjecture 💫

Animating the Collatz Conjecture 💫

Animating one of the most beautiful and confounding problems in math.

Jan 22, 2024

Machine Unlearning 🤖

Machine Unlearning 🤖

I made a Chrome extension that makes overused tech buzzwords in your browser appear sarcastic.

Nov 27, 2020

More

Communities

I am a teaching assistant for the Reinforcement Learning course at TU Darmstadt 🤖
Former Secretary of the Cambridge University Libertas Society, Cambridge’s first free speech society ✒️

Interesting links

Don’t ask to ask, just ask
The XY Problem
The Life of Jos Claerbout
The Worst Rob Liefeld Drawings (the funniest article ever written)

Contact

✉️ aryaman{}reddi{}tu-darmstadt.de

📍 E327, S2|02 Robert-Piloty-Gebäude, Technical University of Darmstadt, Darmstadt 64289