About me
I am a Member of Technical Staff at Reflection AI, working on advanced AI agents/skills and mid/post-training. My research interests span reinforcement learning, LLM agents, scalable machine learning, and ML-enhanced systems optimization. More broadly, I am interested in building intelligent systems that combine learning, reasoning, and robust infrastructure at scale.
I completed my PhD in Computer Science at the Department of Computer Science and Technology, University of Cambridge, where I was advised by Dr. Eiko Yoneki and Prof. Jon Crowcroft in the Machine Learning & Systems Research Group at the Computer Lab. My doctoral research explored the intersection of machine learning and systems, with a particular focus on applying reinforcement learning and learning-based methods to improve real-world systems such as databases, cloud services, and storage systems.
My current research focuses on large language models, VLA/VLM systems, and agent fine-tuning. I am especially interested in reinforcement learning and post-training methods for long-horizon agents, as well as the systems and infrastructure needed to support large-scale training and serving. I broadly describe this line of work as MLSys, with a particular emphasis on RLSys.
Before joining Reflection AI, I was a Research Scientist Intern at Google DeepMind, London, where I worked on the Autonomous Agents team led by Edward Grefenstette. My work focused on RLFT and post-training pipelines for advanced agents. I was also a Research Scientist Intern at Noah’s Ark Research Center UK, where I worked on large-scale RLFT frameworks for VLM-based mobile agents. Earlier, I co-founded and served as CTO of Powersense Technology Limited in Cambridge.
Prior to Cambridge, I received my Master of Engineering in Computer Science from Johns Hopkins University. I also received my Bachelor’s Degree in Physics with a minor in Mathematics from Peking University, where I graduated with honors. At Peking University, I received the Excellent Graduate Student Award from the School of Physics, recognition for my Excellent Graduation Thesis, the Special Award at the 5th Youth Physics Tournament, and the Freshman Scholarship.
I am always keen to collaborate on exciting research problems and impactful industry projects.
News! 🚀
📍 (2026) [Google DeepMind 50-Page Tech Report] - A Subgoal-driven Framework for Improving Long-Horizon LLM Agents
- Our recent work on improving long-horizon LLM agents through subgoal-driven reasoning is currently under review.
📍 (2025.04) [IJCNN 2025] - OCMDP: Observation-Constrained Markov Decision Process
- Our recent work, OCMDP, was accepted by the International Joint Conference on Neural Networks (IJCNN) 2025. Feel free to check our paper here.
📍 (2025.02) [MLSys 2025] - ThunderServe: High-performance and Cost-efficient LLM Serving in Cloud Environments
- Our recent work, ThunderServe, was accepted by MLSys 2025. Feel free to check our paper here.
📍 (2025.01) [SIGMOD 2025] - A New Paradigm in Tuning Learned Indexes: A Reinforcement Learning-Enhanced Approach
- Our latest work on learned index tuning was accepted by SIGMOD 2025. Feel free to check our paper and project website here.
- I was also honored with the Student Award at ACM SIGMOD 2025.
📍 (2025.01) [ICLR 2025] - DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents
- I am pleased to share that our latest work, DistRL, was accepted by ICLR 2025. We also released code and demos — feel free to check the project website here.
Selected Publications 📚
A Subgoal-driven Framework for Improving Long-Horizon LLM Agents
Taiyi Wang, Sian Gooding, Florian Hartmann, Oriana Riva, Edward Grefenstette
Google DeepMind Technical Report
Pushing forward the research of frontier agents on long horizontal tasks.
DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents
Taiyi Wang, Zhihao Wu, Jianheng Liu, et al.
International Conference on Learning Representations (ICLR) 2025
An asynchronous distributed reinforcement learning framework for on-device control agents.

Academic Service and Awards
Academic Service
- Reviewer: ICLR 2025 (Highlighted Reviewer), ICLR 2026
- Reviewer: NeurIPS 2022, NeurIPS 2024 (Top Reviewer), NeurIPS 2025
- Reviewer: ICML 2025, ICML 2026
- Program Committee: EuroMLSys 2022, 2023, 2024, 2025
- Reviewer: SIGMOD 2026
Awards
- Student Award, ACM SIGMOD 2025
- Pillman and Cody Award, University of Cambridge
- Runner-up, Shenzhen Innovation and Entrepreneurship Competition, Global Final
- Runner-up, Chris Abell Postdoc Business Plan Competition, Cambridge
- Finalist (Top 1%), Mathematical Contest in Modeling (MCM)
- Excellent Graduate Student Award, School of Physics, Peking University
- Excellent Graduation Thesis, Peking University
- Special Award, 5th Youth Physics Tournament, Peking University
- Freshman Scholarship, Peking University
Interests and Activities
Beyond research, I have broad interests in sports, leadership, and long-term community engagement.
I am passionate about tennis and previously served as Captain of the Girton College Men’s 1st Tennis Team at the University of Cambridge. I also won the University Cuppers’ Championship for the college in 2024. More broadly, I enjoy skiing and other outdoor activities, which continue to shape my teamwork, discipline, and leadership style.
I have also been involved in long-term charitable activities since 2015, with a focus on supporting education for children in under-resourced areas. In addition, during my time at Peking University, I served as Director of the Debating Center and as a Co-organizer for the Students’ International Communication Association (SICA), experiences that strengthened my commitment to leadership, communication, and community building.
I also co-founded a startup with Dr. Borong Hu. You can find more details here: Powersense Ltd..

