Aston Zhang

Aston Zhang is the long context lead of Llama at Meta. His recent contributions include Llama 4’s 10M-token multimodal context with the iRoPE architecture. His work has been recognized with the ICLR Outstanding Paper Award, the ACM Ubicomp Distinguished Paper Award, and an ACM SenSys Best Paper Award nomination. His textbook, “Dive into Deep Learning,” is adopted worldwide. He earned his Ph.D. in Computer Science from UIUC.

News

Excited to share Llama 4’s 10M-token multimodal context (20+ hours of video) with the iRoPE architecture!
Llama 3.1 405B is now openly available.
Meet Llama 3, our state-of-the-art open source large language model. Check out my developer podcast.

Books

A. Zhang, Z. C. Lipton, M. Li, and A. J. Smola
Dive into Deep Learning
Cambridge University Press, 2023
- Adopted at 500 universities from 70 countries
- Featured in the AWS re:Invent keynote by Swami, Head of AWS AI, Database, and Analytics
A. Zhang, M. Li, Z. C. Lipton, and A. J. Smola
动手学深度学习
人民邮电出版社, 2nd ed., 2023, 1st ed., 2019
- Best seller in China

Papers (All)

M. Zhong*, A. Zhang*, X. Wang, R. Hou, W. Xiong, C. Zhu, Z. Chen, L. Tan, C. Bi, M. Lewis, S. Popuri, S. Narang, M. Kambadur, D. Mahajan, S. Edunov, J. Han, and L. van der Maaten (*equal contribution)
Law of the Weakest Link: Cross Capabilities of Large Language Models
In Proceedings of the International Conference on Learning Representations (ICLR), 2025
llm-cross-capabilities.org
J. Kim, A. Goyal, A. Zhang, B. Xiong, R. Hou, M. Kambadur, D. Mahajan, H. Hajishirzi, and L. Tan
A Systematic Examination of Preference Learning through the Lens of Instruction-Following
In Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL), 2025
Y. Yu, Z. Chen, A. Zhang, L. Tan, C. Zhu, R. Y. Pang, Y. Qian, X. Wang, S. Gururangan, C. Zhang, M. Kambadur, D. Mahajan, and R. Hou
Self-Generated Critiques Boost Reward Modeling for Language Models
In Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL), 2025
Llama Team, AI@Meta (Core Contributor)
The Llama 3 Herd of Models
2024
Z. Zhang and A. Zhang
You Only Look at Screens: Multimodal Chain-of-Action Agents
In Findings of the Association for Computational Linguistics (ACL), 2024
Z. Zhang, A. Zhang, M. Li, H. Zhao, G. Karypis, and A. J. Smola
Multimodal Chain-of-Thought Reasoning in Language Models
In Transactions on Machine Learning Research (TMLR), 2024
[Idea Inspiration by Homeschooling]
S. Ren, A. Zhang, Y. Zhu, S. Zhang, S. Zheng, M. Li, A. J. Smola, X. Sun
Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition
In Proceedings of the Conference on Neural Information Processing Systems (NeurIPS), 2023
Z. Zeng, C. Hawkins, M. Hong, A. Zhang, N. Pappas, V. Singh, and S. Zheng
Scaling Transformers to 128K Tokens or More by Prioritizing Important Tokens
In Proceedings of the Conference on Neural Information Processing Systems (NeurIPS), 2023
J. Chen, A. Zhang, X. Shi, M. Li, A. J. Smola, and D. Yang
Parameter-Efficient Fine-Tuning Design Spaces
In Proceedings of the International Conference on Learning Representations (ICLR), 2023
Z. Zhang, A. Zhang, M. Li, and A. J. Smola
Automatic Chain of Thought Prompting in Large Language Models
In Proceedings of the International Conference on Learning Representations (ICLR), 2023
Z. Liu, Z. Tang, X. Shi, A. Zhang, M. Li, A. Shrivastava, and A. Wilson
Learning Multimodal Data Augmentation in Feature Space
In Proceedings of the International Conference on Learning Representations (ICLR), 2023
T. Yang, Y. Zhu, Y. Xie, A. Zhang, C. Chen, and M. Li
AIM: Adapting Image Models for Efficient Video Understanding
In Proceedings of the International Conference on Learning Representations (ICLR), 2023
H. Wang, A. Zhang, Y. Zhu, S. Zheng, M. Li, A. J. Smola, and Z. Wang
Partial and Asymmetric Contrastive Learning for Out-of-Distribution Detection in Long-Tailed Recognition
In Proceedings of International Conference on Machine Learning (ICML, Long Presentation), 2022
H. Wang, A. Zhang, S. Zheng, X. Shi, M. Li, and Z. Wang
Removing Batch Normalization Boosts Adversarial Training
In Proceedings of International Conference on Machine Learning (ICML), 2022
A. Zhang, Y. Tay, S. Zhang, A. Chan, A. T. Luu, S. C. Hui, and J. Fu
Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with 1/n Parameters
In Proceedings of the International Conference on Learning Representations (ICLR, Outstanding Paper Award), 2021

Tutorials

with A. J. Smola
Attention in Deep Learning [Keynote] [PDF] [Video]
In The 36th International Conference on Machine Learning (ICML), 2019

Services

Area Chair
- Annual Meeting of the Association for Computational Linguistics (ACL)
- Conference on Empirical Methods in Natural Language Processing (EMNLP)
- International Conference on Computational Linguistics (COLING)

Follow @astonzhangAZ
Tweets by astonzhangAZ