Jiawei Zhang

I am currently pursuing my Ph.D. in Computer Science at the University of Chicago, where I serve as a research assistant in the Secure Learning Lab under the guidance of Prof. Bo Li. Before joining UChicago, I was temporarily pursuing my Ph.D. studies at the Siebel School of Computing and Data Science at the University of Illinois Urbana-Champaign, where I also earned my M.S. in Computer Science. Prior to that, I completed my Bachelor's degree at Zhejiang University.

I study Safe AGI, especially how to make LLMs and LLM agents reliable at scale, and explore WorldModels to improve downstream safe policy learning. My research runs as a closed loop: (i) scalable red-teaming that elicits realistic, long-horizon failures—I attended internal red-teaming evaluations of OpenAI o1, Google DeepMind, and ElevenLabs TTS with Virtue AI; (ii) theoretical guarantees via certified robustness that turn safety into provable design constraints; (iii) scalable, interpretability-guided alignments that translate circuit-level insights into generalizable mitigations; and (iv) better safety-aware reasoning for autonomous driving by leveraging additional knowledge from the world model with RL. Together, my goal is to enhance LLMs with stronger reasoning capabilities with world models and improved safety awareness while ensuring scalability.

I am selected as the Anthropic Fellow for AI safety research starting in January 2026!

Email  /  CV  /  Google Scholar  /  Twitter  /  Github

"Man is but a reed, the most feeble thing in nature; but he is a thinking reed."

Industry Experience
NVIDIA Research
Autonomous Driving Research Intern, Santa Clara
Oct 2025 – Present
Advised by Dr. Boris Ivanovic and Prof. Marco Pavone
ByteDance Seed Research
Responsible AI Research Intern, San Jose
June 2025 – Sept 2025
Advised by Dr. Xiaojun Xu and Dr. Hang Li
Meta AI
GenAI Research Collaborator (External)
Sept 2024 – June 2025
Advised by Dr. Shuang Yang
Nuro AI
Machine Learning Research Intern (Pathfinder), Mountain View
May 2024 – Aug 2024
Advised by Dr. Aleksandr Petiushko
Sea AI Lab
Machine Learning Research Intern, Singapore
May 2023 – Aug 2023
Advised by Dr. Tianyu Pang and Dr. Chao Du
Selected Publications (* denotes co-first authorship)
2025
Any-Depth Alignment: Unlocking Innate Safety Alignment of LLMs to Any-Depth
Jiawei Zhang, Andrew Estornell, David D. Baek, Bo Li, Xiaojun Xu
arXiv 2025
ARMs: Adaptive Red-Teaming Agent against Multimodal Models with Plug-and-Play Attacks
Zhaorun Chen*, Xun Liu*, Mintong Kang, Jiawei Zhang, Minzhou Pan, Shuang Yang, Bo Li
arXiv 2025
GraphQ-LM: Scalable Graph Representation for Large Language Models via Residual Vector Quantization
Jiawei Zhang, Yang Yang, Kaushik Rangadurai, Tao Liu, Minhui Huang, Yiping Han, Bo Li, Shuang Yang
arXiv 2025
GuardSet-X: Massive Multi-Domain Safety Policy-Grounded Guardrail Dataset
Mintong Kang*, Zhaorun Chen*, Chejian Xu*, Jiawei Zhang*, Chengquan Guo*, Minzhou Pan, Ivan Revilla, Yu Sun, Bo Li
NeurIPS 2025
UDora: A Unified Red Teaming Framework Against LLM Agents by Dynamically Leveraging Their Own Reasoning
Jiawei Zhang, Shuang Yang, Bo Li
ICML 2025
SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models
Jiawei Zhang, Xuan Yang, Taiqi Wang, Yu Yao, Aleksandr Petiushko, Bo Li
ICML 2025
AdvWeb: Controllable Black-Box Attacks on VLM-Powered Web Agents
Chejian Xu, Mintong Kang, Jiawei Zhang, Zeyi Liao, Lingbo Mo, Mengqi Yuan, Huan Sun, Bo Li
ICML 2025
Guardagent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning
Zhen Xiang, Linzhi Zheng, Yanjie Li, Junyuan Hong, Qinbin Li, Han Xie, Jiawei Zhang, Zidi Xiong, Chulin Xie, Carl Yang, Dawn Song, Bo Li
ICML 2025
MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models
Chejian Xu*, Jiawei Zhang*, Zhaorun Chen*, Chulin Xie*, Mintong Kang*, Zhuowen Yuan*, Alexander Xiong, Zidi Xiong, Chenhui Zhang, Lingzhi Yuan, Yi Zeng, Peiyang Xu, Chengquan Guo, Andy Zhou, Jeffrey Ziwei Tan, Xuandong Zhao, Francesco Pinto, Zhen Xiang, Yu Gai, Zinan Lin, Dan Hendrycks, Bo Li, Dawn Song
ICLR 2025
EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage
Zeyi Liao, Lingbo Mo, Chejian Xu, Mintong Kang, Jiawei Zhang, Chaowei Xiao, Yuan Tian, Bo Li, Huan Sun
ICLR 2025
KnowHalu: Hallucination Detection via Multi-Form Knowledge Based Factual Checking
Jiawei Zhang, Chejian Xu, Yu Gai, Freddy Lecue, Dawn Song, Bo Li
ICLR 2025 Workshop on Foundation Models in the Wild
2024
Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs
Bowen Jin, Chulin Xie, Jiawei Zhang, Kashob Kumar Roy, Yu Zhang, Zheng Li, Ruirui Li, Xianfeng Tang, Suhang Wang, Yu Meng, Jiawei Han
ACL 2024
ChatScene: Knowledge-Enabled Safety-Critical Scenario Generation for Autonomous Vehicles
Jiawei Zhang, Chejian Xu, Bo Li
CVPR 2024
MMCBench: Benchmarking Large Multimodal Models against Common Corruptions
Jiawei Zhang, Tianyu Pang, Chao Du, Yi Ren, Bo Li, Min Lin
arXiv 2024
2023
DiffSmooth: Certifiably Robust Learning via Diffusion Models and Local Smoothing
Jiawei Zhang, Zhongzhu Chen, Huan Zhang, Chaowei Xiao, Bo Li
32th USENIX Security Symposium (USENIX Security) 2023
CARE: Certifiably Robust Learning with Reasoning via Variational Inference
Jiawei Zhang, Linyi Li, Ce Zhang, Bo Li
IEEE Conference on Secure and Trustworthy Machine Learning (SatML) 2023
2022
Improving Certified Robustness via Statistical Learning with Logical Reasoning
Zhuolin Yang*, Zhikuan Zhao*, Boxin Wang, Jiawei Zhang, Linyi Li, Hengzhi Pei, Bojan Karlas, Ji Liu, Heng Guo, Ce Zhang, Bo Li
NeurIPS 2022
Double Sampling Randomized Smoothing
Linyi Li, Jiawei Zhang, Tao Xie, Bo Li
ICML 2022
2021
Progressive-Scale Boundary Blackbox Attack via Projective Gradient Estimation
Jiawei Zhang*, Linyi Li*, Huichen Li, Xiaolu Zhang, Shuang Yang, Bo Li
ICML 2021
Workshops and Competitions
Professional Service
  • Session Chair — ICLR 2025
  • Top Reviewer — NeurIPS 2025
  • Program Committee — ICML, NeurIPS, ICLR, CVPR, ECCV, AISTATS, AAAI, COLM, ACL, JIMR, etc.

Template from Jon Barron.