Publications

Preprints and Conference Publications

The Generative Leap: Sharp Sample Complexity for Efficiently Learning Gaussian Multi-Index Models.
Alex Damian, Jason D. Lee, Joan Bruna.
Discrepancies are Virtue: Weak-to-Strong Generalization through Lens of Intrinsic Dimension.
Yijun Dong, Yicheng Li, Yunai Li, Jason D. Lee, Qi Lei.
Learning Compositional Functions with Transformers from Easy-to-Hard Data.
Zixuan Wang, Eshaan Nichani, Alberto Bietti, Alex Damian, Daniel Hsu, Jason D. Lee, Denny Wu.
COLT 2025.
Emergence and scaling laws in SGD learning of shallow neural networks.
Yunwei Ren, Eshaan Nichani, Denny Wu, Jason D. Lee.
What Makes a Reward Model a Good Teacher? An Optimization Perspective.
Noam Razin, Zixuan Wang, Hubert Strauss, Stanley Wei, Jason D. Lee, Sanjeev Arora.
Accelerating RL for LLM Reasoning with Optimal Advantage Regression.
Kiante Brantley, Mingyu Chen, Zhaolin Gao, Jason D. Lee, Wen Sun, Wenhao Zhan, Xuezhou Zhang.
Understanding Optimization in Deep Learning with Central Flows.
Jeremy M. Cohen, Alex Damian, Ameet Talwalkar, J. Zico Kolter, Jason D. Lee.
ICLR 2025.
Learning Hierarchical Polynomials of Multiple Nonlinear Features with Three-Layer Networks.
Hengyu Fu, Zihao Wang, Eshaan Nichani, Jason D. Lee.
ICLR 2025.
Understanding Factual Recall in Transformers via Associative Memories.
Eshaan Nichani, Jason D. Lee, Alberto Bietti.
ICLR 2025.
Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought.
Jianhao Huang, Zixuan Wang, Jason D. Lee.
ICLR 2025.
Transformers Provably Learn Two-Mixture of Linear Classification via Gradient Flow.
Hongru Yang, Zhangyang Wang, Jason D. Lee, Yingbin Liang.
ICLR 2025.
Exploiting Structure in Offline Multi-Agent RL: The Benefits of Low Interaction Rank.
Wenhao Zhan, Scott Fujimoto, Zheqing Zhu, Jason D. Lee, Daniel R. Jiang, Yonathan Efroni.
ICLR 2025.
Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-Squared Preference Optimization .
Audrey Huang, Wenhao Zhan, Tengyang Xie, Jason D. Lee, Wen Sun, Akshay Krishnamurthy, Dylan J. Foster.
ICLR 2025.
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF.
Zhaolin Gao, Wenhao Zhan, Jonathan D. Chang, Gokul Swamy, Kiante Brantley, Jason D. Lee, Wen Sun.
ICLR 2025.
Anytime Acceleration of Gradient Descent.
Zihan Zhang, Jason D. Lee, Simon S. Du, Yuxin Chen.
COLT 2025.
Learning Orthogonal Multi-Index Models: A Fine-Grained Information Exponent Analysis.
Yunwei Ren and Jason D. Lee.
Task Diversity Shortens the ICL Plateau.
Jaeyeon Kim, Sehyun Kwon, Joo Young Choi, Jongho Park, Jaewoong Cho, Jason D. Lee, Ernest K. Ryu.
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit.
Jason D. Lee, Kazusato Oko, Taiji Suzuki, Denny Wu.
NeurIPS 2024.
Scaling Laws in Linear Regression: Compute, Parameters, and Data.
Licong Lin, Jingfeng Wu, Sham M. Kakade, Peter L. Bartlett, Jason D. Lee.
NeurIPS 2024.
Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity.
Qian Yu, Yining Wang, Baihe Huang, Qi Lei, and Jason D. Lee.
NeurIPS 2024.
BitDelta: Your Fine-Tune May Only Be Worth One Bit.
James Liu, Guangxuan Xiao, Kai Li, Jason D. Lee, Song Han, Tri Dao, and Tianle Cai.
NeurIPS 2024.
REBEL: Reinforcement Learning via Regressing Relative Rewards.
Zhaolin Gao, Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kiante Brantley, Thorsten Joachims, J. Andrew Bagnell, Jason D. Lee, Wen Sun.
NeurIPS 2024.
Learning and Transferring Sparse Contextual Bigrams with Linear Transformers.
Yunwei Ren, Zixuan Wang, and Jason D. Lee.
NeurIPS 2024.
How Transformers Learn Causal Structure with Gradient Descent.
Eshaan Nichani, Alex Damian, and Jason D. Lee.
ICML 2024.
An Information-Theoretic Analysis of In-Context Learning.
Hong Jun Jeon, Jason D. Lee, Qi Lei, Benjamin Van Roy.
ICML 2024.
LoRA Training in the NTK Regime has No Spurious Local Minima.
Uijeong Jang, Jason D. Lee, and Ernest K. Ryu.
ICML 2024.
Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark.
Yihua Zhang et al.
ICML 2024.
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads.
Tianle Cai, Yuhong Li, Zhengyang Geng, Hongwu Peng, Jason D. Lee, Deming Chen, and Tri Dao.
Code, and Blog.
ICML 2024.
Transformers Provably Learn Sparse Token Selection While Fully-Connected Nets Cannot.
Zixuan Wang, Stanley Wei, Daniel Hsu, Jason D. Lee.
ICML 2024.
Computational-Statistical Gaps in Gaussian Single-Index Models.
Alex Damian, Loucas Pillaud-Vivien, Jason D. Lee, Joan Bruna.
COLT 2024.
Optimal Multi-Distribution Learning.
Zihan Zhang, Wenhao Zhan, Yuxin Chen, Simon S. Du, and Jason D. Lee.
COLT 2024.
Settling the Sample Complexity of Online Reinforcement Learning.
Zihan Zhang, Yuxin Chen, Jason D. Lee, and Simon S. Du.
COLT 2024.
Provable Offline Reinforcement Learning with Human Feedback.
Wenhao Zhan, Masatoshi Uehara, Nathan Kallus, Jason D. Lee, and Wen Sun.
ICLR 2024.
Provable Reward-Agnostic Preference-Based Reinforcement Learning.
Wenhao Zhan, Masatoshi Uehara, Nathan Kallus, Jason D. Lee, and Wen Sun.
ICLR 2024.
Horizon-Free Regret for Linear Markov Decision Processes.
Zihan Zhang, Jason D. Lee, Yuxin Chen, Simon S. Du.
ICLR 2024
Provably Efficient CVaR RL in Low-rank MDPs.
Yulai Zhao, Wenhao Zhan, Xiaoyan Hu, Ho-fung Leung, Farzan Farnia, Wen Sun, and Jason D. Lee.
ICLR 2024.
Learning Hierarchical Polynomials with Three-Layer Neural Networks.
Zihao Wang, Eshaan Nichani, and Jason D. Lee.
ICLR 2024.
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking.
Kaifeng Lyu, Jikai Jin, Zhiyuan Li, Simon S. Du, Jason D. Lee, and Wei Hu.
ICLR 2024.
Teaching Arithmetic to Small Transformers.
Nayoung Lee, Kartik Sreenivasan, Jason D. Lee, Kangwook Lee, and Dimitris Papailiopoulos.
ICLR 2024.
REST: Retrieval-Based Speculative Decoding.
Zhenyu He, Zexuan Zhong, Tianle Cai, Jason D Lee, and Di He.
blogpost and code
NAACL 2024.
Dataset Reset Policy Optimization for RLHF.
Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Kiante Brantley, Dipendra Misra, Jason D. Lee, Wen Sun
Scaling In-Context Demonstrations with Structured Attention.
Tianle Cai, Kaixuan Huang, Jason D. Lee, and Mengdi Wang.
Towards Optimal Statistical Watermarking.
Baihe Huang, Banghua Zhu, Hanlin Zhu, Jason D. Lee, Jiantao Jiao, and Michael I. Jordan.
Reward Collapse in Aligning Large Language Models.
Ziang Song, Tianle Cai, Jason D. Lee, and Weijie J. Su.
How Well Can Transformers Emulate In-context Newton's Method?
Angeliki Giannou, Liu Yang, Tianhao Wang, Dimitris Papailiopoulos, and Jason D. Lee.
Fine-Tuning Language Models with Just Forward Passes .
Sadhika Malladi, Tianyu Gao, Eshaan Nichani, Alex Damian, Jason D. Lee, Danqi Chen, and Sanjeev Arora.
NeurIPS 2023.
Sample Complexity for Quadratic Bandits: Hessian Dependent Bounds and Optimal Algorithms.
Qian Yu, Yining Wang, Baihe Huang, Qi Lei, and Jason D. Lee.
NeurIPS 2023.
Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample Complexity for Learning Single Index Models.
Alex Damian, Eshaan Nichani, Rong Ge, and Jason D. Lee.
NeurIPS 2023.
Implicit Bias of Gradient Descent for Logistic Regression at the Edge of Stability.
Jingfeng Wu, Vladimir Braverman, and Jason D. Lee.
NeurIPS 2023.
Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid Reinforcement Learning.
Gen Li, Wenhao Zhan, Jason D. Lee, Yuejie Chi, and Yuxin Chen.
NeuriPS 2023.
Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks .
Eshaan Nichani, Alex Damian, and Jason D. Lee.
NeurIPS 2023.
Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage.
Masatoshi Uehara, Nathan Kallus, Jason D. Lee, and Wen Sun.
NeurIPS 2023.
Looped Transformers as Programmable Computers.
Angeliki Giannou, Shashank Rajput, Jy-yong Sohn, Kangwook Lee, Jason D. Lee, and Dimitris Papailiopoulos. ICML 2023.
Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing.
Jikai Jin, Zhiyuan Li, Kaifeng Lyu, Simon S. Du, and Jason D. Lee.
ICML 2023.
Computationally Efficient PAC RL in POMDPs with Latent Determinism and Conditional Embeddings.
Masatoshi Uehara, Ayush Sekhari, Jason D. Lee, Nathan Kallus, and Wen Sun.
ICML 2023.
Efficient displacement convex optimization with particle gradient descent.
Hadi Daneshmand, Jason D. Lee, and Chi Jin.
ICML 2023.
Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning.
Yulai Zhao, Zhuoran Yang, Zhaoran Wang, and Jason D. Lee.
ICML 2023.
Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability.
Alex Damian, Eshaan Nichani, and Jason D. Lee.
ICLR 2023.
Decentralized Optimistic Hyperpolicy Mirror Descent: Provably No-Regret Learning in Markov Games.
Wenhao Zhan, Jason D. Lee, and Zhuoran Yang.
ICLR 2023.
PAC Reinforcement Learning for Predictive State Representations.
Wenhao Zhan, Masatoshi Uehara, Wen Sun, and Jason D. Lee.
ICLR 2023.
Can We Find Nash Equilibria at a Linear Rate in Markov Games?
Zhuoqing Song, Jason D. Lee, and Zhuoran Yang.
ICLR 2023.
Reconstructing Training Data from the Loss Gradient, Provably.
Zihan Wang, Jason D. Lee, Qi Lei.
AISTATS 2023.
Provable Hierarchy-Based Meta-Reinforcement Learning.
Kurtland Chua, Qi Lei, and Jason D. Lee.
AISTATS 2023.
Provably Efficient Reinforcement Learning via Surprise Bound.
Hanlin Zhu, Ruosong Wang, and Jason D. Lee.
AISTATS 2023.
Optimal Sample Complexity Bounds for Non-convex Optimization under Kurdyka-Lojasiewicz Condition.
Qian Yu, Yining Wang, Baihe Huang, Qi Lei, and Jason D. Lee.
AISTATS 2023.
Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent.
Zhiyuan Li, Tianhao Wang, Jason D. Lee, and Sanjeev Arora.
NeurIPS 2022.
Neural Networks can Learn Representations with Gradient Descent.
Alex Damian, Jason D. Lee, and Mahdi Soltanolkotabi.
COLT 2022.
Provably Efficient Reinforcement Learning in Partially Observable Dynamical System.
Masatoshi Uehara, Ayush Sekhari, Jason D. Lee, Nathan Kallus, and Wen Sun.
NeurIPS 2022.
From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent.
Christopher De Sa, Satyen Kale, Jason D. Lee, Ayush Sekhari, and Karthik Sridharan.
NeurIPS 2022.
Offline Reinforcement Learning with Realizability and Single-policy Concentrability.
Wenhao Zhan, Baihe Huang, Audrey Huang, Nan Jiang, and Jason D. Lee.
COLT 2022.
Identifying good directions to escape the NTK regime and efficiently learn low-degree plus sparse polynomials.
Eshaan Nichani, Yu Bai, and Jason D. Lee.
NeurIPS 2022.
On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias.
Itay Safran, Gal Vardi, and Jason D. Lee.
NeurIPS 2022.
Nearly Minimax Algorithms for Linear Bandits with Shared Representation.
Jiaqi Yang, Qi Lei, Jason D. Lee, and Simon S. Du.
Optimization-Based Separations for Neural Networks.
Itay Safran and Jason D. Lee.
COLT 2022.
Provable Regret Bounds for Deep Online Learning and Control.
Xinyi Chen, Edgar Minasyan, Jason D. Lee, Elad Hazan.
Optimal Gradient-based Algorithms for Non-concave Bandit Optimization.
Baihe Huang, Kaixuan Huang, Sham M. Kakade, Jason D. Lee, Qi Lei, Runzhe Wang, and Jiaqi Yang.
NeurIPS 2021. Video and Slides from Qi Lei.
Going Beyond Linear RL: Sample Efficient Neural Function Approximation.
Baihe Huang, Kaixuan Huang, Sham M. Kakade, Jason D. Lee, Qi Lei, Runzhe Wang, and Jiaqi Yang.
NeurIPS 2021.
Towards General Function Approximation in Zero-Sum Markov Games.
Baihe Huang, Jason D. Lee, Zhaoran Wang, and Zhuoran Yang.
ICLR 2022.
A Short Note on the Relationship of Information Gain and Eluder Dimension.
Kaixuan Huang, Sham M. Kakade, Jason D. Lee, and Qi Lei.
Label Noise SGD Provably Prefers Flat Global Minimizers.
Alex Damian, Tengyu Ma, and Jason D. Lee.
NeurIPS 2021.
How Fine-Tuning Allows for Effective Meta-Learning.
Kurtland Chua, Qi Lei, and Jason D. Lee.
NeurIPS 2021.
Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence.
Wenhao Zhan, Shicong Cen, Baihe Huang, Yuxin Chen, Jason D. Lee, and Yuejie Chi.
Predicting What You Already Know Helps: Provable Self-Supervised Learning.
Jason D. Lee, Qi Lei, Nikunj Saunshi, and Jiacheng Zhuo.
NeurIPS 2021.
Modeling from Features: a Mean-field Framework for Over-parameterized Deep Neural Networks .
Cong Fang, Jason D. Lee, Pengkun Yang, Tong Zhang.
COLT 2021.
Shape Matters: Understanding the Implicit Bias of the Noise Covariance.
Jeff Z. HaoChen, Colin Wei, Jason D. Lee, and Tengyu Ma.
COLT 2021.
Bilinear Classes: A Structural Framework for Provable Generalization in RL.
Simon S. Du, Sham M. Kakade, Jason D. Lee, Shachar Lovett, Gaurav Mahajan, Wen Sun, and Ruosong Wang.
ICML 2021.
Provably Efficient Policy Gradient Methods for Two-Player Zero-Sum Markov Games.
Yulai Zhao, Yuandong Tian, Jason D. Lee, and Simon S. Du.
AISTATS 2022.
Near-Optimal Linear Regression under Distribution Shift.
Qi Lei, Wei Hu, and Jason D. Lee.
ICML 2021.
A Theory of Label Propagation for Subpopulation Shift.
Tianle Cai, Ruiqi Gao, Jason D. Lee, and Qi Lei.
ICML 2021.
How Important is the Train-Validation Split in Meta-Learning?
Yu Bai, Minshuo Chen, Pan Zhou, Tuo Zhao, Jason D. Lee, Sham Kakade, Huan Wang, and Caiming Xiong.
ICML 2021.
Provable Benefits of Representation Learning in Linear Bandits.
Jiaqi Yang, Wei Hu, Jason D. Lee, Simon S. Du.
ICLR 2021.
Few-Shot Learning via Learning the Representation, Provably.
Simon S. Du, Wei Hu, Sham M. Kakade, Jason D. Lee, and Qi Lei.
ICLR 2021.
Towards Understanding Hierarchical Learning: Benefits of Neural Representations.
Minshuo Chen, Yu Bai, Jason D. Lee, Tuo Zhao, Huan Wang, Caiming Xiong, and Richard Socher.
NeurIPS 2020.
Sanity-Checking Pruning Methods: Random Tickets can Win the Jackpot.
Jingtong Su, Yihang Chen, Tianle Cai, Tianhao Wu, Ruiqi Gao, Liwei Wang, and Jason D. Lee.
NeurIPS 2020.
Towards Understanding Hierarchical Learning: Benefits of Neural Representations.
Minshuo Chen, Yu Bai, Jason D. Lee, Tuo Zhao, Huan Wang, Caiming Xiong, and Richard Socher.
NeurIPS 2020.
Agnostic Q-learning with Function Approximation in Deterministic Systems: Tight Bounds on Approximation Error and Sample Complexity.
Simon S. Du, Jason D. Lee, Gaurav Mahajan, and Ruosong Wang.
NeurIPS 2020.
Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy.
Edward Moroshko, Suriya Gunasekar, Blake Woodworth, Jason D. Lee, Nathan Srebro, Daniel Soudry.
NeurIPS 2020.
Beyond Lazy Training for Over-parameterized Tensor Decomposition.
Xiang Wang, Chenwei Wu, Jason D. Lee, Tengyu Ma, and Rong Ge.
NeurIPS 2020.
Convergence of Meta-Learning with Task-Specific Adaptation over Partial Parameters.
Kaiyi Ji, Jason D. Lee, Yingbin Liang, H. Vincent Poor.
NeurIPS 2020.
Generalized Leverage Score Sampling for Neural Networks.
Zheng Yu, Ruoqi Shen, Zhao Song, Jason D. Lee, and Mengdi Wang.
NeurIPS 2020.
How to Characterize The Landscape of Overparameterized Convolutional Neural Networks.
Weizhong Zhang, Yihong Gu, Cong Fang, Jason D. Lee, and Tong Zhang.
NeurIPS 2020.
SGD Learns One-Layer Networks in WGANs.
Qi Lei, Jason D. Lee, Alex Dimakis, and Costis Daskalakis.
ICML 2020.
Optimal transport mapping via input convex neural networks.
Ashok Vardhan Makkuva, Amirhossein Taghvaei , Sewoong Oh, and Jason D. Lee.
ICML 2020.
On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift
Alekh Agarwal, Sham M. Kakade, Jason D. Lee, and Gaurav Mahajan.
Short version at COLT 2020.
Kernel and Deep Regimes in Overparametrized Models
Blake Woodworth, Suriya Gunasekar, Jason D. Lee, Daniel Soudry, and Nathan Srebro.
COLT 2020.
Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks .
Yu Bai and Jason D. Lee.
ICLR 2020.
Steepest Descent Neural Architecture Optimization: Escaping Local Optimum with Signed Neural Splitting.
Lemeng Wu, Mao Ye, Qi Lei, Jason D. Lee, and Qiang Liu.
When Does Non-Orthogonal Tensor Decomposition Have No Spurious Local Minima?.
Maziar Sanjabi, Sina Baharlouei, Meisam Razaviyayn, and Jason D. Lee.
Convergence of Adversarial Training in Overparametrized Networks
Ruiqi Gao, Tianle Cai, Haochuan Li, Liwei Wang, Cho-Jui Hsieh, and Jason D. Lee.
NeurIPS 2019.
Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel.
Colin Wei, Jason D. Lee, Qiang Liu, and Tengyu Ma.
Previous title “On the Margin Theory of Feedforward Neural Networks” and the new version incorporates a lower bound on the NTK family of kernels.
NeurIPS 2019.
Solving a Class of Non-Convex Min-Max Games Using Iterative First Order Methods .
Maher Nouiehed, Maziar Sanjabi, Jason D. Lee, and Meisam Razaviyayn.
This supersedes our earlier note.
NeurIPS 2019.
Neural Temporal-Difference Learning Converges to Global Optima
Qi Cai, Zhuoran Yang, Jason D. Lee, and Zhaoran Wang.
NeurIPS 2019.
Incremental Methods for Weakly Convex Optimization
Xiao Li, Zhihui Zhu, Anthony Man-Cho So, Jason D. Lee.
Gradient Descent Finds Global Minima of Deep Neural Networks
Simon S. Du, Jason D. Lee, Haochuan Li, Liwei Wang, and Xiyu Zhai.
ICML 2019.
Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Training
Mor Shpigel Nacson, Suriya Gunasekar, Jason D. Lee, Nathan Srebro, and Daniel Soudry.
ICML 2019.
Convergence to Second-Order Stationarity for Constrained Non-Convex Optimization.
Maher Nouiehed, Jason D. Lee, and Meisam Razaviyayn.
Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced.
Simon S. Du, Wei Hu, and Jason D. Lee.
Best Paper Award in the ICML 2018 Workshop on Nonconvex Optimization for ML
NeurIPS 2018.
Provably Correct Automatic Sub-Differentiation for Qualified Programs.
Sham Kakade and Jason D. Lee.
NeurIPS 2018.
Solving Approximate Wasserstein GANs to Stationarity.
Maziar Sanjabi, Jimmy Ba, Meisam Razaviyayn, and Jason D. Lee.
NeurIPS 2018.
Implicit Bias of Gradient Descent on Linear Convolutional Networks.
Suriya Gunasekar, Jason D. Lee, Daniel Soudry, and Nathan Srebro.
NeurIPS 2018.
Adding One Neuron Can Eliminate All Bad Local Minima.
Shiyu Liang, Ruoyu Sun, Jason D. Lee, and R. Srikant.
Accepted at NeurIPS 2018.
Note: This result only guarantees that all finite local minimizers are eliminated, where finite means points of finite norm. The regularizer moves local minimizers to infinite norm. It does NOT guarantee the convergence of SGD or gradient descent to the global minimizer.
Convergence of Gradient Descent on Separable Data.
Mor Shpigel Nacson, Jason D. Lee, Suriya Gunasekar, Nathan Srebro, and Daniel Soudry.
AISTATS 2019.
On the Power of Over-parametrization in Neural Networks with Quadratic Activation.
Simon S. Du and Jason D. Lee.
ICML 2018.
Characterizing Implicit Bias in Terms of Optimization Geometry.
Suriya Gunasekar, Jason D. Lee, Daniel Soudry, and Nathan Srebro.
ICML 2018.
Gradient Primal-Dual Algorithm Converges to Second-Order Stationary Solutions for Nonconvex Distributed Optimization.
Mingyi Hong, Jason D. Lee, and Meisam Razaviyayn.
ICML 2018.
Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima.
Simon S. Du, Jason D. Lee, Yuandong Tian, Barnabas Poczos, and Aarti Singh.
ICML 2018.
Learning One-hidden-layer Neural Networks with Landscape Design.
Rong Ge, Jason D. Lee, and Tengyu Ma.
ICLR 2018.
When is a Convolutional Filter Easy To Learn?
Simon S. Du, Jason D. Lee, and Yuandong Tian
ICLR 2018.
No Spurious Local Minima in a Two Hidden Unit ReLU Network.
Jiajun Luo, Chenwei Wu, and Jason D. Lee.
An inexact subsampled proximal Newton-type method for large-scale machine learning.
Xuanqing Liu, Cho-Jui Hsieh, Jason D. Lee, and Yuekai Sun.
Gradient Descent Can Take Exponential Time to Escape Saddle Points.
Simon S. Du, Chi Jin, Jason D. Lee, Michael I. Jordan, Barnabas Poczos, and Aarti Singh.
Neural Information Processing Systems (NIPS 2017)
Learning Halfspaces and Neural Networks with Random Initialization.
Yuchen Zhang, Jason D. Lee, Martin Wainwright, and Michael I. Jordan.
AI & Statistics (AISTATS 2017) .
Black-box Importance Sampling.
Qiang Liu and Jason D. Lee.
AI & Statistics (AISTATS 2017) .
Sketching Meets Random Projection in the Dual: A Provable Recovery Algorithm for Big and High-dimensional Data.
Jialei Wang, Jason D. Lee, Mehrdad Mahdavi, Mladen Kolar, and Nathan Srebro.
AI & Statistics (AISTATS 2017) .
Gradient Descent Converges to Minimizers.
Jason D. Lee, Max Simchowitz, Michael I. Jordan, and Benjamin Recht.
Conference on Learning Theory (COLT 2016).
Matrix Completion has No Spurious Local Minimum.
Rong Ge, Jason D. Lee, and Tengyu Ma.
Neural Information Processing Systems (NIPS 2016), Best Student Paper Award.
L1-regularized Neural Networks are Improperly Learnable in Polynomial Time.
Yuchen Zhang, Jason D. Lee, and Michael I. Jordan.
International Conference on Machine Learning (ICML 2016).
A Kernelized Stein Discrepancy for Goodness-of-fit Tests and Model Evaluation.
Qiang Liu, Jason D. Lee, and Michael I. Jordan.
International Conference on Machine Learning (ICML 2016).
Evaluating the Statistical Significance of Submatrices.
Jason D. Lee, Yuekai Sun, and Jonathan Taylor.
Neural Information Processing Systems (NIPS 2015).
Exact Post Model Selection Inference for Marginal Screening.
Jason D. Lee and Jonathan Taylor.
Neural Information Processing Systems (NIPS 2014).
Scalable Methods for Nonnegative Matrix Factorizations of Near-Separable Tall-and-Skinny Matrices
Ausin R. Benson, Jason D. Lee, Bartek Rajwa, and David F. Gleich.
Neural Information Processing Systems (NIPS 2014), Spotlight.
[Code] [Heat Transfer Data] [Flow cytometry data]
Using Multiple Samples to Learn Mixture Models.
Jason D. Lee, Ran Gilad-Bachrach, and Rich Caruana.
Neural Information Processing Systems (NIPS 2013), Spotlight.
On Model Selection Consistency of M-Estimators with Geometrically Decomposable Penalties.
Jason D. Lee, Yuekai Sun, and Jonathan Taylor.
Neural Information Processing Systems (NIPS 2013).
Extended version available on arXiv.
Learning Mixed Graphical Models
Jason D. Lee and Trevor Hastie.
AI & Statistics (AISTATS) 2013.
Extended version available on arXiv. Project Homepage.
Proximal Newton-type Methods for Minimizing Convex Objective Functions in Composite Form.
Jason D. Lee, Yuekai Sun, and Michael Saunders.
Neural Information Processing Systems (NIPS 2012).
[arXiv] [Code].
Practical Large Scale Optimization for Max-norm Regularization.
Jason D. Lee, Benjamin Recht, Ruslan Salakhutdinov, Nati Srebro, and Joel Tropp.
Neural Information Processing Systems (NIPS 2010).
Multiscale Dynamic Graphs
Jason D. Lee and Mauro Maggioni.
Sampling Theory and its Applications (SAMPTA) 2011.
Generalized DCell Structure for Load-Balanced Data Center Networks
Markus Kliegl, Jason D. Lee, Jun Li, Xinchao Zhang, Chuanxiong Guo, and David Rincon.
IEEE International Conference on Computer Communications (INFOCOM 2010).
Estimation of Intrinsic Dimensionality of Samples from Noisy Low-Dimensional Manifolds in High Dimensions with Multiscale SVD
Anna V. Little, Jason D. Lee, and Mauro Maggioni.
IEEE Statistical Signal Processing Workshop (SSP 2009).

Journal Publications

Neural Q-Learning Converges to Global Optima
Qi Cai, Zhuoran Yang, Jason D. Lee, and Zhaoran Wang.
Mathematics of Operations (short version at NeurIPS).
Linearized ADMM Converges to Second-Order Stationary Points for Non-Convex Problems
Songtao Lu, Jason D. Lee, Meisam Razaviyayn, Mingyi Hong.
IEEE Transactions on Signal Processing.
Distributed Estimation for Principal Component Analysis: a Gap-free Approach.
Xi Chen, Jason D. Lee, He Li, and Yun Yang.
Journal of the American Statistical Association.
A Flexible Framework for Hypothesis Testing in High-dimensions.
Adel Javanmard and Jason D. Lee.
Journal of the Royal Statistical Society Series B.
Statistical Inference for Model Parameters in Stochastic Gradient Descent.
Xi Chen, Jason D. Lee, Xin T. Tong, and Yichen Zhang.
Annals of Statistics.
First-order Methods Almost Always Avoid Saddle Points.
Jason D. Lee, Ioannis Panageas, Georgios Piliouras, Max Simchowitz, Michael I. Jordan, and Benjamin Recht.
Math Programming.
Stochastic subgradient method converges on tame functions.
Damek Davis, Dmitriy Drusvyatskiy, Sham Kakade, and Jason D. Lee.
Foundations of Computational Mathematics.
Theoretical insights into the optimization landscape of over-parameterized shallow neural networks.
Mahdi Soltanolkotabi, Adel Javanmard, and Jason D. Lee
IEEE Transactions on Information Theory 2018.
Communication-efficient distributed statistical learning.
Michael I. Jordan, Jason D. Lee, and Yun Yang.
Journal of the American Statistical Association 2018.
Distributed Stochastic Variance Reduced Gradient Methods.
Jason D. Lee, Qihang Lin, Tengyu Ma, and Tianbao Yang.
Journal of Machine Learning Research 2017.
Communication-Efficient Distributed Sparse Regression.
Jason D. Lee, Qiang Liu, Yuekai Sun, and Jonathan Taylor.
Journal of Machine Learning Research 2017.
Sketching Meets Random Projection in the Dual: A Provable Recovery Algorithm for Big and High-dimensional Data.
Jialei Wang, Jason D. Lee, Mehrdad Mahdavi, Mladen Kolar, and Nathan Srebro.
Electronic Journal of Statistics 2017.
Exact Post-Selection Inference with the Lasso.
Jason D. Lee, Dennis L Sun, Yuekai Sun, and Jonathan Taylor.
Annals of Statistics 2016.
Selective Inference and Learning Mixed Graphical Models.
PhD Thesis
Chapter 3 contains unpublished results on combining selective inference with the debiased lasso, knockoff, and proposes a general algorithm for selective inference.
Matrix Completion and Low-Rank SVD via Fast Alternating Least Squares.
Trevor Hastie, Rahul Mazumder, Jason D. Lee, and Reza Zadeh.
Journal of Machine Learning Research 2015.
[Spark Implementation] [R Implementation]
On Model Selection Consistency of M-Estimators with Geometrically Decomposable Penalties.
Jason D. Lee, Yuekai Sun, and Jonathan Taylor.
Electronic Journal of Statistics 2015.
Learning the Structure of Mixed Graphical Models.
Jason D. Lee and Trevor Hastie.
Journal of Computational and Graphical Statistics 2014.
[Code] [arXiv]
Proximal Newton-type Methods for Minimizing Convex Objective Functions in Composite Form.
Jason D. Lee, Yuekai Sun, and Michael Saunders.
SIAM Journal on Optimization 2014.
PNOPT package

Other Publications

Analysis of Inexact Proximal Newton-type Methods.
Jason D. Lee, Yuekai Sun, and Michael Saunders.
Neural Information Processing Systems Optimization and Machine Learning Workshop 2012.
Learning Structured Matrices
Jason D. Lee, Carlos Sing-Long, and Yuekai Sun.
Class Project for Discrete Math and Advanced Topics in Convex Optimization
Multiclass Clustering using a Semidefinite Relaxation
Jason D. Lee
Class Project for Machine Learning.
Multiscale Estimation of Intrinsic Dimensionality of Point Cloud Data and Multiscale Analysis of Dynamic Graphs
Jason D. Lee, Advisor: Mauro Maggioni.
Awarded Graduation with High Distinction.
The Generalized DCell Network Structures and Their Graph Properties
Markus Kliegl, Jason Lee, Jun Li, Xinchao Zhang, Chuanxiong Guo, and David Rincon.
Microsoft Research Technical Report, October 2009.
Existence of Asymptotic Solutions to Semilinear Partial Difference Equations on Graphs
Jason D. Lee and John Neuberger.
AMS-MAA Joint Mathematics Meetings 2008.