Larry, Yinxi Li

yinxi.li[at]uwaterloo.ca | Office DC 2555

self.jpg

Carpe diem, com respeito e amor

🦄 About Me | Currently Master’s Student in CS

Hey there, I’m Larry! Thanks for visiting my humble page. I’m currently a first-year Computer Science M.Math. student at the University of Waterloo, advised by Pengyu Nie. Before that, I received my B.Sc. in Computer Science (ELITE Stream) from The Chinese University of Hong Kong, and my final year project was advised by Eric Lo. I also spent a semester as an exchange student at ETH Zürich during my undergraduate, where I explored advanced topics in Machine Learning and Software Engineering.

💻 Research & Technical Interests

  • NLP - Tokenization
  • LLM for Software Engineering, LLM for Code, LLM for Math

🥂 Always Open for a Chat! I’m always happy to discuss anything, whether it’s research-related or just a casual conversation. Feel free to reach out to me, as long as you come in with a friendly attitude 👉👈.

news

Oct 17, 2025 📌 New preprint: TokDrift: When LLM Speaks in Subwords but Code Speaks in Grammar is now available on arXiv! 1️⃣ LLMs’ subword tokenizers don’t align well with programming language grammar: tiny whitespace or renaming tweaks -> different tokenization -> flipped outputs. 2️⃣ Our framework TokDrift systematically tests 9 code LLMs on 3 tasks, showing their sensitivity to tokenization changes: up to 60% outputs change under a single semantic-preserving rewrite. 3️⃣ If your win margin is ~1 pp, beware: spacing & naming can swing results.
Feb 15, 2025 Acknowledgements
Feb 15, 2025 Hello my personal website! Let’s make a brithday for it😍. Glad to see you there but it was still under construction. Hopefully it will be done soon.

selected publications

  1. TokDrift_example.gif
    TokDrift: When LLM Speaks in Subwords but Code Speaks in Grammar
    Yinxi Li, Yuntian Deng, and Pengyu Nie
    arXiv preprint arXiv:2510.14972, 2025