Member of Technical Staff

Jiayi Wei

I build AI systems that understand and write code.

I'm a member of technical staff at Microsoft Superintelligence, where I work on improving foundation models' software engineering capabilities across pre-training, mid-training, and post-training. Before this, I was a founding research scientist at Augment Code, where I led the ML research behind the Next Edit feature and built code completion and retrieval systems used by thousands of developers.

I received my PhD in Computer Science from UT Austin, where I studied how to combine deep learning with static program analysis for tasks like type inference and code editing. My research interests span LLM training, code generation, programming language theory, and reinforcement learning.

Journey

2025 — Present

Member of Technical Staff

Microsoft Superintelligence

Designing pre-training data pipelines and chain-of-thought reasoning RL methods for foundation models.

2023 — 2025

Founding Research Scientist

Augment Code

Led ML research behind "Next Edit" feature. Built code completion models and static-analysis-augmented retrieval.

Summer 2020

PhD Software Engineer Intern

Facebook

Trained expression-level code autocomplete models based on GPT-2.

2018 — 2023

PhD in Computer Science

UT Austin

Published at ICLR (spotlight), OOPSLA, FSE. Advisors: Isil Dillig & Greg Durrett.

2013 — 2017

BSc in Physics

USTC

University of Science and Technology of China. Foundation in mathematical reasoning and first-principles thinking.

Selected Work

Research spanning AI for code, program analysis, and machine learning.

Coeditor

ICLR 2024 Spotlight

Predicting the next code edit from repository-level diffs

This research became the foundation for Augment Code's Next Edit feature, used by thousands of developers daily.

auto-editing LLM

Paper Code

TypeT5

ICLR 2023

Using static analysis to help language models infer types

Showed that combining seq2seq models with static analysis significantly improves type prediction for Python.

type-inference static-analysis

Paper Code

LambdaNet

ICLR 2020

Probabilistic type inference with graph neural networks

One of the first systems to use GNNs for type inference, achieving state-of-the-art results on TypeScript.

type-inference GNN

Paper Code

Relational Verification using RL

OOPSLA 2019

Teaching RL agents to prove program equivalence

Applied reinforcement learning to automated verification of relational program properties.

verification RL

Paper

Singularity

FSE 2018

Fuzzing for worst-case complexity bugs

Pattern-based fuzzing technique that discovers algorithmic complexity vulnerabilities in software.

program-analysis synthesis

Paper Code

STEADY

IROS 2022

Simultaneous state estimation and dynamics learning from indirect observations

A Bayesian approach to jointly learning system dynamics and estimating state from indirect sensor data.

robotics Bayesian

Paper

View all publications on Google Scholar →

Industry Impact

Research that ships to production.

Augment Code

Next Edit

Led the end-to-end ML pipeline: built the synthetic data generation system, designed and trained the code editing model, built evaluation infrastructure, and authored the research blog and feature launch blog.

Pre-training Data Pipelines

Designed core algorithms and data pipelines to transform raw GitHub data into pre-training and mid-training corpora tailored for downstream software engineering tasks.

Augment Code

Intelligent Code Completion

Researched and built a static-analysis-based synthetic data pipeline and signature-augmented retrieval system to improve code completion quality and relevance.

Get in Touch

I'm always happy to discuss research collaborations or interesting problems in AI for code.

Say Hello