About me
I am a researcher in Reinforcement Learning, LLM Agents, Scalable Machine Learning, and ML-enhanced Systems Optimization. I completed my PhD in Computer Science at the Department of Computer Science and Technology, University of Cambridge, My research focuses on large language models, VLA/VLM systems, and agent fine-tuning. I also have a strong background in applying machine learning—particularly reinforcement learning—to improve real-world systems such as databases, cloud services, and general storage systems. In addition, I bring solid systems and infrastructure expertise to support large-scale training and serving. I broadly describe this line of work as MLSys (especially RLSys).
During my PhD, I was fortunate to be advised by Dr. Eiko Yoneki and Prof. Jon Crowcroft from the Machine Learning & Systems Research Group at the Computer Lab, where we explored the intersection of machine learning and systems with the goal of pushing the boundaries of both fields.
Prior to Cambridge, I received my Bachelor’s Degree in Physics with a minor in Mathematics from Peking University, where I graduated with honors. I was honored with the Excellent Graduate Student Award from the School of Physics and received recognition for my Excellent Graduation Thesis. I was also awarded the Special Award at the 5th Youth Physics Tournament and the Freshman Scholarship at Peking University. I later completed my Master of Engineering in Computer Science at Johns Hopkins University.
My recent research and industry experience includes serving as a Research Scientist Intern at Google DeepMind, London, on the Autonomous Agents team led by Edward Grefenstette, where I worked on RLFT and post-training pipelines for advanced agents. Before that, I was a Research Scientist Intern at Noah’s Ark Research Center UK, working on large-scale RLFT frameworks for VLM-based mobile agents. I was also the Co-founder and CTO of Powersense Technology Limited in Cambridge.
I am always keen to collaborate on exciting research problems and impactful industry projects.
News! 🚀
📍 (2026) [Google DeepMind 50-Page Tech Report] - A Subgoal-driven Framework for Improving Long-Horizon LLM Agents
- Our recent work on improving long-horizon LLM agents through subgoal-driven reasoning is currently under review.
📍 (2025.04) [IJCNN 2025] - OCMDP: Observation-Constrained Markov Decision Process
- Our recent work, OCMDP, was accepted by the International Joint Conference on Neural Networks (IJCNN) 2025. Feel free to check our paper here.
📍 (2025.02) [MLSys 2025] - ThunderServe: High-performance and Cost-efficient LLM Serving in Cloud Environments
- Our recent work, ThunderServe, was accepted by MLSys 2025. Feel free to check our paper here.
📍 (2025.01) [SIGMOD 2025] - A New Paradigm in Tuning Learned Indexes: A Reinforcement Learning-Enhanced Approach
- Our latest work on learned index tuning was accepted by SIGMOD 2025. Feel free to check our paper and project website here.
- I was also honored with the Student Award at ACM SIGMOD 2025.
📍 (2025.01) [ICLR 2025] - DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents
- I am pleased to share that our latest work, DistRL, was accepted by ICLR 2025. We also released code and demos — feel free to check the project website here.
Selected Publications 📚
A Subgoal-driven Framework for Improving Long-Horizon LLM Agents
Taiyi Wang, Sian Gooding, Florian Hartmann, Oriana Riva, Edward Grefenstette
Google DeepMind Technical Report
Pushing forward the research of frontier agents on long horizontal tasks.
DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents
Taiyi Wang, Zhihao Wu, Jianheng Liu, et al.
International Conference on Learning Representations (ICLR) 2025
An asynchronous distributed reinforcement learning framework for on-device control agents.

Academic Service and Awards
Academic Service
- Reviewer: ICLR 2025 (Highlighted Reviewer), ICLR 2026
- Reviewer: NeurIPS 2022, NeurIPS 2024 (Top Reviewer), NeurIPS 2025
- Reviewer: ICML 2025, ICML 2026
- Program Committee: EuroMLSys 2022, 2023, 2024, 2025
- Reviewer: SIGMOD 2026
Awards
- Student Award, ACM SIGMOD 2025
- Pillman and Cody Award, University of Cambridge
- Runner-up, Shenzhen Innovation and Entrepreneurship Competition, Global Final
- Runner-up, Chris Abell Postdoc Business Plan Competition, Cambridge
- Finalist (Top 1%), Mathematical Contest in Modeling (MCM)
- Excellent Graduate Student Award, School of Physics, Peking University
- Excellent Graduation Thesis, Peking University
- Special Award, 5th Youth Physics Tournament, Peking University
- Freshman Scholarship, Peking University
Interests and Activities
Beyond research, I have broad interests in sports, leadership, and long-term community engagement.
I am passionate about tennis and previously served as Captain of the Girton College Men’s 1st Tennis Team at the University of Cambridge. I also won the University Cuppers’ Championship for the college in 2024. More broadly, I enjoy skiing and other outdoor activities, which continue to shape my teamwork, discipline, and leadership style.
I have also been involved in long-term charitable activities since 2015, with a focus on supporting education for children in under-resourced areas. In addition, during my time at Peking University, I served as Director of the Debating Center and as a Co-organizer for the Students’ International Communication Association (SICA), experiences that strengthened my commitment to leadership, communication, and community building.
I also co-founded a startup with Dr. Borong Hu. You can find more details here: Powersense Ltd..

