Coeditor
ICLR 2024 SpotlightPredicting the next code edit from repository-level diffs
This research became the foundation for Augment Code's Next Edit feature, used by thousands of developers daily.
Member of Technical Staff
I build AI systems that understand and write code.
I'm a member of technical staff at Microsoft Superintelligence, where I work on improving foundation models' software engineering capabilities across pre-training, mid-training, and post-training. Before this, I was a founding research scientist at Augment Code, where I led the ML research behind the Next Edit feature and built code completion and retrieval systems used by thousands of developers.
I received my PhD in Computer Science from UT Austin, where I studied how to combine deep learning with static program analysis for tasks like type inference and code editing. My research interests span LLM training, code generation, programming language theory, and reinforcement learning.
Designing pre-training data pipelines and chain-of-thought reasoning RL methods for foundation models.
Led ML research behind "Next Edit" feature. Built code completion models and static-analysis-augmented retrieval.
Trained expression-level code autocomplete models based on GPT-2.
UT Austin
Published at ICLR (spotlight), OOPSLA, FSE. Advisors: Isil Dillig & Greg Durrett.
USTC
University of Science and Technology of China. Foundation in mathematical reasoning and first-principles thinking.
Research spanning AI for code, program analysis, and machine learning.
Predicting the next code edit from repository-level diffs
This research became the foundation for Augment Code's Next Edit feature, used by thousands of developers daily.
Teaching RL agents to prove program equivalence
Applied reinforcement learning to automated verification of relational program properties.
Simultaneous state estimation and dynamics learning from indirect observations
A Bayesian approach to jointly learning system dynamics and estimating state from indirect sensor data.
Research that ships to production.
Led the end-to-end ML pipeline: built the synthetic data generation system, designed and trained the code editing model, built evaluation infrastructure, and authored the research blog and feature launch blog.
Read moreDesigned core algorithms and data pipelines to transform raw GitHub data into pre-training and mid-training corpora tailored for downstream software engineering tasks.
Researched and built a static-analysis-based synthetic data pipeline and signature-augmented retrieval system to improve code completion quality and relevance.
I'm always happy to discuss research collaborations or interesting problems in AI for code.
Say Hello© 2026 Jiayi Wei