Academic Homepage

Zhe Yang

Master's student in Artificial Intelligence (MSAI) at Nanyang Technological University, working on multimodal and agent-related research with Associate Professor Ziwei Liu at MMLab@NTU.

My current research focuses on multimodal learning, agent systems, and personalized AI. I previously worked on LLM memorization, data privacy, and tokenization-related security risks with Prof. Michael R. Lyu at CUHK.

  • Singapore
  • MSAI at NTU since August 2025
  • 3 papers currently under review
About

Multimodal AI, agent systems, and grounded personalization

I am currently pursuing the Master of Science in Artificial Intelligence (MSAI) at Nanyang Technological University. Since August 2025, I have been working with Associate Professor Ziwei Liu at MMLab@NTU on papers and research projects, with a current focus on multimodal systems and agentic intelligence.

Before NTU, I received my B.Sc. in Computer Science from The Chinese University of Hong Kong, graduating with a CGPA of 3.785/4.0. My earlier work centered on understanding memorization in large language models, with emphasis on data compressibility, tokenization, and security risks in code LLMs.

Education
  • 2025.08 - Present M.S. in Artificial Intelligence, NTU
  • 2021.09 - 2025.07 B.Sc. in Computer Science, CUHK
  • 2023.07 Summer School, Peking University
Research Interests
  • Multimodal learning and grounding
  • Agent systems and contextual reasoning
  • Personalized AI and memory-centric systems
  • LLM memorization, privacy, and security
Research

Current directions

Multimodal Agents

Designing systems that can search, perceive, and reason over heterogeneous digital environments, with emphasis on grounded multimodal interaction.

Personalized AI

Building and evaluating systems that recover user-level context from long-horizon, file-system-scale behavioral and multimodal traces.

LLM Memorization and Security

Studying how data properties and tokenization affect memorization, leakage risk, and privacy vulnerabilities in large language models.

Publications

Selected papers

The papers below include recent work currently under review and ongoing research projects. At this stage, the four papers listed here are all in the review cycle.

Under Review 2026

FileGram: Grounding Agent Personalization in File-System Behavioral Traces

Shuai Liu, Shulin Tian, Kairui Hu, Yuhao Dong, Zhe Yang, Bo Li, Jingkang Yang, Chen Change Loy, Ziwei Liu

A framework for grounding agent memory and personalization in file-system behavioral traces, spanning data generation, benchmarking, and memory architecture.

Under Review 2026

Data Compressibility Quantifies LLM Memorization

Yizhan Huang, Zhe Yang, Meifang Chen, HUANG Nianchen, Jianping Zhang, Michael R. Lyu

This work studies how data compressibility relates to memorization in LLMs and proposes a quantitative perspective on memorization behavior.

Under Review 2026

Understanding Secret Leakage Risks in Code LLMs: A Tokenization Perspective

Meifang Chen, Zhe Yang, HUANG Nianchen, Yizhan Huang, Yichen Li, Zihan Li, Michael R. Lyu

An investigation into how BPE tokenization contributes to secret memorization and leakage risks in code LLMs through what we term gibberish bias.

Published 2024

RAUIE: A Relation-Augmented Document-level Event Extraction Model Based on UIE

Co-author

Presented at AINIT 2024. Earlier work on document-level information extraction.

Experience

Research and training

2025.08 - Present

Research with Associate Professor Ziwei Liu, MMLab@NTU

Working on multimodal and agent-oriented papers and research projects while pursuing the M.S. in AI at NTU.

2024.09 - 2025.07

Final-Year Research with Prof. Michael R. Lyu

Focused on LLM memorization, entropy-based characterization, dataset inference, and tokenization-related security risks in code LLMs.

2024.07 - 2024.08

Algorithm Intern, Geovis Technology

Worked on automated data acquisition, knowledge graph modeling, fine-tuning, and graph-based analysis pipelines.

2024.06 - 2024.09

UG Summer Research Internship

Evaluated memorization difficulty in large language models through entropy, perplexity, and memorization-rate analysis on open-source models.

2023.06 - 2023.09

UG Summer Research Internship

Studied adversarial attacks on gender recognition systems and analyzed robustness and fairness issues in facial-recognition APIs.

Honors

Selected awards

2025

HKSAR Government Scholarship

Top 1% in Computer Science.

2022 - 2025

Dean's List

Top 10% in the Faculty of Engineering, CUHK.

2024

CSE Scholarship Silver Award

Top 1% in the Faculty of Engineering, CUHK.

Contact

Get in touch

Email zhe012@e.ntu.edu.sg

Location Singapore

CV Download PDF

If you would like to discuss multimodal learning, agent systems, personalized AI, or LLM memorization and security, feel free to reach out.