About me

I am a Member of Technical Staff at Reflection AI, working on advanced AI agents/skills and mid/post-training. My research interests span reinforcement learning, LLM agents, scalable machine learning, and ML-enhanced systems optimization. More broadly, I am interested in building intelligent systems that combine learning, reasoning, and robust infrastructure at scale.

I completed my PhD in Computer Science at the Department of Computer Science and Technology, University of Cambridge, where I was advised by Dr. Eiko Yoneki and Prof. Jon Crowcroft in the Machine Learning & Systems Research Group at the Computer Lab. My doctoral research explored the intersection of machine learning and systems, with a particular focus on applying reinforcement learning and learning-based methods to improve real-world systems such as databases, cloud services, and storage systems.

My current research focuses on large language models, VLA/VLM systems, and agent fine-tuning. I am especially interested in reinforcement learning and post-training methods for long-horizon agents, as well as the systems and infrastructure needed to support large-scale training and serving. I broadly describe this line of work as MLSys, with a particular emphasis on RLSys.

Before joining Reflection AI, I was a Research Scientist Intern at Google DeepMind, London, where I worked on the Autonomous Agents team led by Edward Grefenstette. My work focused on RLFT and post-training pipelines for advanced agents. I was also a Research Scientist Intern at Noah’s Ark Research Center UK, where I worked on large-scale RLFT frameworks for VLM-based mobile agents. Earlier, I co-founded and served as CTO of Powersense Technology Limited in Cambridge.

Prior to Cambridge, I received my Master of Engineering in Computer Science from Johns Hopkins University. I also received my Bachelor’s Degree in Physics with a minor in Mathematics from Peking University, where I graduated with honors. At Peking University, I received the Excellent Graduate Student Award from the School of Physics, recognition for my Excellent Graduation Thesis, the Special Award at the 5th Youth Physics Tournament, and the Freshman Scholarship.

I am always keen to collaborate on exciting research problems and impactful industry projects.

News! 🚀

📍 (2026) [Google DeepMind 50-Page Tech Report] - A Subgoal-driven Framework for Improving Long-Horizon LLM Agents

Our recent work on improving long-horizon LLM agents through subgoal-driven reasoning is currently under review.

📍 (2025.04) [IJCNN 2025] - OCMDP: Observation-Constrained Markov Decision Process

Our recent work, OCMDP, was accepted by the International Joint Conference on Neural Networks (IJCNN) 2025. Feel free to check our paper here.

📍 (2025.02) [MLSys 2025] - ThunderServe: High-performance and Cost-efficient LLM Serving in Cloud Environments

Our recent work, ThunderServe, was accepted by MLSys 2025. Feel free to check our paper here.

📍 (2025.01) [SIGMOD 2025] - A New Paradigm in Tuning Learned Indexes: A Reinforcement Learning-Enhanced Approach

Our latest work on learned index tuning was accepted by SIGMOD 2025. Feel free to check our paper and project website here.
I was also honored with the Student Award at ACM SIGMOD 2025.

📍 (2025.01) [ICLR 2025] - DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents

I am pleased to share that our latest work, DistRL, was accepted by ICLR 2025. We also released code and demos — feel free to check the project website here.

Selected Publications 📚

A Subgoal-driven Framework for Improving Long-Horizon LLM Agents

Taiyi Wang, Sian Gooding, Florian Hartmann, Oriana Riva, Edward Grefenstette

Google DeepMind Technical Report

Pushing forward the research of frontier agents on long horizontal tasks.

Paper

DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents

Taiyi Wang, Zhihao Wu, Jianheng Liu, et al.

International Conference on Learning Representations (ICLR) 2025

An asynchronous distributed reinforcement learning framework for on-device control agents.

Paper Code Project

A New Paradigm in Tuning Learned Indexes: A Reinforcement Learning-Enhanced Approach

Taiyi Wang, Liang Liang, Guang Yang, Thomas Heinis, Eiko Yoneki

International Conference on Management of Data (SIGMOD) 2025

A reinforcement learning-enhanced approach for tuning learned indexes.

Paper Project

Academic Service and Awards

Academic Service

Reviewer: ICLR 2025 (Highlighted Reviewer), ICLR 2026
Reviewer: NeurIPS 2022, NeurIPS 2024 (Top Reviewer), NeurIPS 2025
Reviewer: ICML 2025, ICML 2026
Program Committee: EuroMLSys 2022, 2023, 2024, 2025
Reviewer: SIGMOD 2026

Awards

Student Award, ACM SIGMOD 2025
Pillman and Cody Award, University of Cambridge
Runner-up, Shenzhen Innovation and Entrepreneurship Competition, Global Final
Runner-up, Chris Abell Postdoc Business Plan Competition, Cambridge
Finalist (Top 1%), Mathematical Contest in Modeling (MCM)
Excellent Graduate Student Award, School of Physics, Peking University
Excellent Graduation Thesis, Peking University
Special Award, 5th Youth Physics Tournament, Peking University
Freshman Scholarship, Peking University

Interests and Activities

Beyond research, I have broad interests in sports, leadership, and long-term community engagement.

I am passionate about tennis and previously served as Captain of the Girton College Men’s 1st Tennis Team at the University of Cambridge. I also won the University Cuppers’ Championship for the college in 2024. More broadly, I enjoy skiing and other outdoor activities, which continue to shape my teamwork, discipline, and leadership style.

I have also been involved in long-term charitable activities since 2015, with a focus on supporting education for children in under-resourced areas. In addition, during my time at Peking University, I served as Director of the Debating Center and as a Co-organizer for the Students’ International Communication Association (SICA), experiences that strengthened my commitment to leadership, communication, and community building.

I also co-founded a startup with Dr. Borong Hu. You can find more details here: Powersense Ltd..

ski_png