|
Jiawei Zhang
I am currently pursuing my Ph.D. in Computer Science at the University of Chicago, where I serve as a research assistant in the Secure Learning Lab under the guidance of Prof. Bo Li. Before joining UChicago, I was temporarily pursuing my Ph.D. studies at the Siebel School of Computing and Data Science at the University of Illinois Urbana-Champaign, where I also earned my M.S. in Computer Science. Prior to that, I completed my Bachelor's degree at Zhejiang University. I have also interned at NVIDIA Research, Bytedance Seed, Nuro AI, and Sea AI, and work closely with Meta GenAI.
I study how to make LLMs and LLM agents reliable at scale. My research runs as a closed loop:
(i) scalable red-teaming that elicits realistic, long-horizon failures —
I attended internal red-teaming evaluations of OpenAI o1, Google DeepMind, and ElevenLabs TTS
with Virtue AI;
(ii) theoretical guarantees via certified robustness that turn safety into provable design constraints;
(iii) scalable, interpretability-guided defenses that translate circuit-level insights into generalizable mitigations; and
(iv) closed-loop safety guarantees for autonomous driving, where perception–language–action coupling magnifies risk.
Together, these pillars elevate safety from ad-hoc patches to engineerable, provable, and generalizable methods.
Email  / 
CV  / 
Google Scholar  / 
Twitter  / 
Github
|
"Man is but a reed, the most feeble thing in nature; but he is a thinking reed."
|
|
Selected Publications (* denotes co-first authorship)
2025
Any-Depth Alignment: Unlocking Innate Safety Alignment of LLMs to Any-Depth
Jiawei Zhang, Andrew Estornell, David D. Baek, Bo Li, Xiaojun Xu
arXiv 2025
GraphQ-LM: Scalable Graph Representation for Large Language Models via Residual Vector Quantization
Jiawei Zhang, Yang Yang, Kaushik Rangadurai, Tao Liu, Minhui Huang, Yiping Han, Bo Li, Shuang Yang
arXiv 2025
GuardSet-X: Massive Multi-Domain Safety Policy-Grounded Guardrail Dataset
Mintong Kang*, Zhaorun Chen*, Chejian Xu*, Jiawei Zhang*, Chengquan Guo*, Minzhou Pan, Ivan Revilla, Yu Sun, Bo Li
NeurIPS 2025
UDora: A Unified Red Teaming Framework Against LLM Agents by Dynamically Leveraging Their Own Reasoning
Jiawei Zhang, Shuang Yang, Bo Li
ICML 2025
SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models
Jiawei Zhang, Xuan Yang, Taiqi Wang, Yu Yao, Aleksandr Petiushko, Bo Li
ICML 2025
AdvWeb: Controllable Black-Box Attacks on VLM-Powered Web Agents
Chejian Xu, Mintong Kang, Jiawei Zhang, Zeyi Liao, Lingbo Mo, Mengqi Yuan, Huan Sun, Bo Li
ICML 2025
Guardagent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning
Zhen Xiang, Linzhi Zheng, Yanjie Li, Junyuan Hong, Qinbin Li, Han Xie, Jiawei Zhang, Zidi Xiong, Chulin Xie, Carl Yang, Dawn Song, Bo Li
ICML 2025
MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models
Chejian Xu*, Jiawei Zhang*, Zhaorun Chen*, Chulin Xie*, Mintong Kang*, Zhuowen Yuan*, Alexander Xiong, Zidi Xiong, Chenhui Zhang, Lingzhi Yuan, Yi Zeng, Peiyang Xu, Chengquan Guo, Andy Zhou, Jeffrey Ziwei Tan, Xuandong Zhao, Francesco Pinto, Zhen Xiang, Yu Gai, Zinan Lin, Dan Hendrycks, Bo Li, Dawn Song
ICLR 2025
EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage
Zeyi Liao, Lingbo Mo, Chejian Xu, Mintong Kang, Jiawei Zhang, Chaowei Xiao, Yuan Tian, Bo Li, Huan Sun
ICLR 2025
KnowHalu: Hallucination Detection via Multi-Form Knowledge Based Factual Checking
Jiawei Zhang, Chejian Xu, Yu Gai, Freddy Lecue, Dawn Song, Bo Li
ICLR 2025 Workshop on Foundation Models in the Wild
2024
Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs
Bowen Jin, Chulin Xie, Jiawei Zhang, Kashob Kumar Roy, Yu Zhang, Zheng Li, Ruirui Li, Xianfeng Tang, Suhang Wang, Yu Meng, Jiawei Han
ACL 2024
ChatScene: Knowledge-Enabled Safety-Critical Scenario Generation for Autonomous Vehicles
Jiawei Zhang, Chejian Xu, Bo Li
CVPR 2024
MMCBench: Benchmarking Large Multimodal Models against Common Corruptions
Jiawei Zhang, Tianyu Pang, Chao Du, Yi Ren, Bo Li, Min Lin
arXiv 2024
2023
DiffSmooth: Certifiably Robust Learning via Diffusion Models and Local Smoothing
Jiawei Zhang, Zhongzhu Chen, Huan Zhang, Chaowei Xiao, Bo Li
32th USENIX Security Symposium (USENIX Security) 2023
CARE: Certifiably Robust Learning with Reasoning via Variational Inference
Jiawei Zhang, Linyi Li, Ce Zhang, Bo Li
IEEE Conference on Secure and Trustworthy Machine Learning (SatML) 2023
2022
Improving Certified Robustness via Statistical Learning with Logical Reasoning
Zhuolin Yang*, Zhikuan Zhao*, Boxin Wang, Jiawei Zhang, Linyi Li, Hengzhi Pei, Bojan Karlas, Ji Liu, Heng Guo, Ce Zhang, Bo Li
NeurIPS 2022
Double Sampling Randomized Smoothing
Linyi Li, Jiawei Zhang, Tao Xie, Bo Li
ICML 2022
2021
Progressive-Scale Boundary Blackbox Attack via Projective Gradient Estimation
Jiawei Zhang*, Linyi Li*, Huichen Li, Xiaolu Zhang, Shuang Yang, Bo Li
ICML 2021
Workshops and Competitions
Professional Service
- Session Chair — ICLR 2025
- Top Reviewer — NeurIPS 2025
- Program Committee — ICML, NeurIPS, ICLR, CVPR, ECCV, AISTATS, AAAI, COLM, ACL, JIMR, etc.
|
|