news

Oct 17, 2025 📌 New preprint: TokDrift: When LLM Speaks in Subwords but Code Speaks in Grammar is now available on arXiv! 1️⃣ LLMs’ subword tokenizers don’t align well with programming language grammar: tiny whitespace or renaming tweaks -> different tokenization -> flipped outputs. 2️⃣ Our framework TokDrift systematically tests 9 code LLMs on 3 tasks, showing their sensitivity to tokenization changes: up to 60% outputs change under a single semantic-preserving rewrite. 3️⃣ If your win margin is ~1 pp, beware: spacing & naming can swing results.
Feb 15, 2025 Acknowledgements
Feb 15, 2025 Hello my personal website! Let’s make a brithday for it😍. Glad to see you there but it was still under construction. Hopefully it will be done soon.