About me
I am a researcher in Reinforcement Learning, LLM Agents, Scalable Machine Learning, and ML-enhanced Systems Optimization. I completed my PhD in Computer Science at the Department of Computer Science and Technology, University of Cambridge, in January 2026. My research focuses on applying machine learning, especially reinforcement learning, to improve real systems such as databases, LLM fine-tuning pipelines, and LLM serving systems — an area I broadly describe as ML4Sys (especially RL4Sys).
During my PhD, I was fortunate to be advised by Dr. Eiko Yoneki and Prof. Jon Crowcroft from the Machine Learning & Systems Research Group at the Computer Lab, where we explored the intersection of machine learning and systems with the goal of pushing the boundaries of both fields.
Prior to Cambridge, I received my Bachelor’s Degree in Physics with a minor in Mathematics from Peking University, where I graduated with honors. I was honored with the Excellent Graduate Student Award from the School of Physics and received recognition for my Excellent Graduation Thesis. I was also awarded the Special Award at the 5th Youth Physics Tournament and the Freshman Scholarship at Peking University. I later completed my Master of Engineering in Computer Science at Johns Hopkins University.
My recent research and industry experience includes serving as a Research Scientist Intern at Google DeepMind, London, on the Autonomous Agents team led by Edward Grefenstette, where I worked on RLFT and post-training pipelines for advanced agents. Before that, I was a Research Scientist Intern at Noah’s Ark Research Center UK, working on large-scale RLFT frameworks for VLM-based mobile agents. I was also the Co-founder and CTO of Powersense Technology Limited in Cambridge.
I am always keen to collaborate on exciting research problems and impactful industry projects.
News! 🚀
📍 (2026) [Google DeepMind 50-Page Tech Report] - A Subgoal-driven Framework for Improving Long-Horizon LLM Agents
- Our recent work on improving long-horizon LLM agents through subgoal-driven reasoning is currently under review.
📍 (2025.04) [IJCNN 2025] - OCMDP: Observation-Constrained Markov Decision Process
- Our recent work, OCMDP, was accepted by the International Joint Conference on Neural Networks (IJCNN) 2025. Feel free to check our paper here.
📍 (2025.02) [MLSys 2025] - ThunderServe: High-performance and Cost-efficient LLM Serving in Cloud Environments
- Our recent work, ThunderServe, was accepted by MLSys 2025. Feel free to check our paper here.
📍 (2025.01) [SIGMOD 2025] - A New Paradigm in Tuning Learned Indexes: A Reinforcement Learning-Enhanced Approach
- Our latest work on learned index tuning was accepted by SIGMOD 2025. Feel free to check our paper and project website here.
- I was also honored with the Student Award at ACM SIGMOD 2025.
📍 (2025.01) [ICLR 2025] - DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents
- I am pleased to share that our latest work, DistRL, was accepted by ICLR 2025. We also released code and demos — feel free to check the project website here.
Selected Publications 📚
DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents
Taiyi Wang, Zhihao Wu, Jianheng Liu, et al.
International Conference on Learning Representations (ICLR) 2025
An asynchronous distributed reinforcement learning framework for on-device control agents.

OCMDP: Observation-Constrained Markov Decision Process
Taiyi Wang, Jianheng Liu, Bryan Lee, Zhihao Wu, Yu Wu
International Joint Conference on Neural Networks (IJCNN) 2025
A study of observation-constrained Markov decision processes.
ThunderServe: High-performance and Cost-efficient LLM Serving in Cloud Environments
Youhe Jiang, Fangcheng Fu, Xiaozhe Yao, Taiyi Wang, Bin Cui, Ana Klimovic, Eiko Yoneki
Annual Conference on Machine Learning and Systems (MLSys) 2025
High-performance and cost-efficient LLM serving in cloud environments.
Academic Service and Awards
Academic Service
- Reviewer: ICLR 2025 (Highlighted Reviewer), ICLR 2026
- Reviewer: NeurIPS 2022, NeurIPS 2024 (Top Reviewer), NeurIPS 2025
- Reviewer: ICML 2025, ICML 2026
- Program Committee: EuroMLSys 2022, 2023, 2024, 2025
- Reviewer: SIGMOD 2026
Awards
- Student Award, ACM SIGMOD 2025
- Pillman and Cody Award, University of Cambridge
- Runner-up, Shenzhen Innovation and Entrepreneurship Competition, Global Final
- Runner-up, Chris Abell Postdoc Business Plan Competition, Cambridge
- Finalist (Top 1%), Mathematical Contest in Modeling (MCM)
- Excellent Graduate Student Award, School of Physics, Peking University
- Excellent Graduation Thesis, Peking University
- Special Award, 5th Youth Physics Tournament, Peking University
- Freshman Scholarship, Peking University
Interests and Activities
Beyond research, I have broad interests in sports, leadership, and long-term community engagement.
I am passionate about tennis and previously served as Captain of the Girton College Men’s 1st Tennis Team at the University of Cambridge. I also won the University Cuppers’ Championship for the college in 2024. More broadly, I enjoy skiing and other outdoor activities, which continue to shape my teamwork, discipline, and leadership style.
I have also been involved in long-term charitable activities since 2015, with a focus on supporting education for children in under-resourced areas. In addition, during my time at Peking University, I served as Director of the Debating Center and as a Co-organizer for the Students’ International Communication Association (SICA), experiences that strengthened my commitment to leadership, communication, and community building.
I also co-founded a startup with Dr. Borong Hu. You can find more details here: Powersense Ltd..

