Mohammad Aflah Khan

Research Software Engineer @ MPI-SWS · OSS @ EleutherAI

Hi, I'm Aflah, a research software engineer at the Max Planck Institute for Software Systems. My primary focus is on advancing our understanding of large language models (LLMs), evaluating their capabilities, and developing AI powered co-pilots to support researchers. Previously, I’ve worked on projects aimed at reducing hate speech on social media and other applications under NLP for social good.

Open to researcher/research engineer/backend engineer roles

Contact me

Experience

Ongoing first (rest sorted by end date)

Max Planck Institute for Software Systems (MPI-SWS)

Research Software Engineer • April, 2024 — Present [Full Time] | Nov, 2023 — March, 2024 [Part time] | Aug, 2023 — Oct, 2023 [Intern]

Working under Dr Krishna Gummadi to explore different aspects of LLMs. Some areas we've explored/are exploring are

Optimizing pre-training and inference for LLMs
LLM memorization and the impact of Parameter-Efficient Fine-Tuning (PEFT) on memorization
Knowledge acquisition and evaluation of factual knowledge in LLMs
Built and currently maintain key internal tools OpenChat (An internal chatbot), MaxCast (A research paper-to-podcast conversion service) & MaxChat (A document-based chat service). These services were developed from scratch, including hosting models on-premises and fine-tuning for optimal performance.
Published and submitted research to top-tier (A*) conferences

EleutherAI

Open Source Contributor • Dec, 2022 — Present

Currently working on the Multilingual Natural Instructions project to build a massive instruction tuning corpus for Hindi. Previously worked on -

Pythia - A Suite for Analyzing Large Language Models Across Training and Scaling (Accepted ICML'23) - Majorly contributed to the gender bias evals and intervention case study. The models have over 18 million downloads (as of April 2025)
Recite, Reconstruct, Recollect - Memorization in LMs as a Multifaceted Phenomenon (Accepted ICLR'25) - An intuitive taxonomy to classify memorized sequences and then build predictors based on these classes

Laboratory for Computational Social Systems (LCS2)

Undergraduate Student Researcher • June, 2021 — May, 2024

I've worked on a variety of projects, from hate speech normalization to designing recommendations for fine-tuning improved hate speech detectors. I also led the QUENCH project, a benchmark aimed at evaluating advanced reasoning abilities in large language models, with a particular emphasis on Indic contexts.

Goldman Sachs

Summer Analyst • May, 2023 — July, 2023

Worked in the Finance, Planning & Analysis Engineering division towards revamping the central hub of the department. Also built POCs based on user feedback to improve the search and access experience on the webapp. Also recieved a return offer to join full time as an Analyst.

Google Summer of Code - TensorFlow

Open Source Developer • May, 2022 — Sept, 2022

Worked with Matthew Watson & Chen Qian towards adding support for data augmentation layers to KerasNLP a library under the Keras/TensorFlow Ecosystem which aims to build industry oriented NLP Solutions. I also contributed to several bug fixes and other utilities such as tokenizers and transformer encoder & decoder.

Publications

* indicates equal contribution

2025

Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon

USVSN Sai Prashanth, Alvin Deng, Kyle O'Brien, Jyothir S V, Mohammad Aflah Khan, Jaydeep Borkar, Christopher A. Choquette-Choo, Jacob Ray Fuehne, Stella Biderman, Tracy Ke, Katherine Lee, Naomi Saphra • 2025

ICLR 2025 - The Thirteenth International Conference on Learning Representations

Towards Reliable Latent Knowledge Estimation in LLMs: Zero-Prompt Many-Shot Based Factual Knowledge Extraction

Qinyuan Wu, Mohammad Aflah Khan, Soumi Das, Vedant Nanda, Bishwamittra Ghosh, Camila Kolling, Till Speicher, Laurent Bindschaedler, Krishna P Gummadi, Evimaria Terzi • 2025

WSDM 2025 - Proceedings of the 18th ACM International Conference on Web Search and Data Mining

QUENCH: Measuring the gap between Indic and Non-Indic Contextual General Reasoning in LLMs

*Mohammad Aflah Khan**, Neemesh Yadav*, Sarah Masud, Md Shad Akhtar • 2025

COLING 2025 - Proceedings of the 31st International Conference on Computational Linguistics

In Agents We Trust, but Who Do Agents Trust? Latent Source Preferences Steer LLM Generations

Mohammad Aflah Khan, Mahsa Amani, Soumi Das, Bishwamittra Ghosh, Qinyuan Wu, Krishna P. Gummadi, Manish Gupta, Abhilasha Ravichander •

R2-FM @ ICML 2025 - Workshop on Reliable and Responsible Foundation Models

Rethinking Memorization Measures in LLMs: Recollection vs. Counterfactual vs. Contextual Memorization

Bishwamittra Ghosh, Soumi Das, Qinyuan Wu, Mohammad Aflah Khan, Krishna P. Gummadi, Evimaria Terzi, Deepak Garg • 2025

MemFM @ ICML 2025 - The Impact of Memorization on Trustworthy Foundation Models

Rote Learning Considered Useful: Generalizing over Memorized Data in LLMs

Qinyuan Wu, Soumi Das, Mahsa Amani, Bishwamittra Ghosh, Mohammad Aflah Khan, Krishna P. Gummadi, Muhammad Bilal Zafar • 2025

MemFM @ ICML 2025 - The Impact of Memorization on Trustworthy Foundation Models

Under Review Projects

In Agents We Trust, but Who Do Agents Trust? Latent Source Preferences Steer LLM Generations

Mohammad Aflah Khan, Mahsa Amani, Soumi Das, Bishwamittra Ghosh, Qinyuan Wu, Krishna P. Gummadi, Manish Gupta, Abhilasha Ravichander

Under Review

Revisiting Privacy, Utility, and Efficiency Trade-offs when Fine-Tuning Large Language Models

Soumi Das, Camila Kolling, Mohammad Aflah Khan, Mahsa Amani, Bishwamittra Ghosh, Qinyuan Wu, Till Speicher, Krishna P. Gummadi

Under Review

Understanding the Mechanics and Dynamics of Memorisation in Large Language Models: A Case Study with Random Strings

Till Speicher, Mohammad Aflah Khan, Qinyuan Wu, Vedant Nanda, Soumi Das, Bishwamittra Ghosh, Krishna P. Gummadi, Evimaria Terzi

Under Review

Learning without Memorization Considered Infeasible: Rethinking Memorization Measures in LLMs

Bishwamittra Ghosh, Soumi Das, Qinyuan Wu, Mohammad Aflah Khan, Krishna P. Gummadi, Evimaria Terzi, Deepak Garg

Under Review

2024

The Duality of Hope: A Critical Examination of Controversial Annotations in HopeEDI

*Mohammad Aflah Khan**, Neemesh Yadav, Diksha Sethi, Raghav Sahni* • 2024

The Second Tiny Papers Track at ICLR 2024

Probing Critical Learning Dynamics of PLMs for Hate Speech Detection

Sarah Masud, Mohammad Aflah Khan**, Vikram Goyal, Md Shad Akhtar, Tanmoy Chakraborty • 2024

EACL 2024 - Findings of the Association for Computational Linguistics

2023

Overview of the HASOC Subtracks at FIRE 2023: Detection of Hate Spans and Conversational Hate-Speech

Shrey Satapara, Sarah Masud, Hiren Madhu, Mohammad Aflah Khan, Md Shad Akhtar, Tanmoy Chakraborty, Sandip Modha, Thomas Mandl • 2023

FIRE 2023 - Proceedings of the 15th Annual Meeting of the Forum for Information Retrieval Evaluation

Overview of the HASOC Subtrack at FIRE 2023: Identification of Tokens Contributing to Explicit Hate in English by Span Detection

Sarah Masud, Mohammad Aflah Khan, Md. Shad Akhtar, Tanmoy Chakraborty • 2023

In Working Notes of FIRE 2023 - Forum for Information Retrieval Evaluation

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

Stella Biderman, Hailey Schoelkopf, Quentin Gregory Anthony, Herbie Bradley, Kyle O’Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Edward Raff, Aviya Skowron, Lintang Sutawika, Oskar van der Wal • 2023

ICML 2023 - The Fortieth International Conference on Machine Learning

The Art of Embedding Fusion: Optimizing Hate Speech Detection

*Mohammad Aflah Khan**, Neemesh Yadav*, Mohit Jain, Sanyam Goyal • 2023

The First Tiny Papers Track at ICLR 2023

Beyond Negativity: Re-Analysis and Follow-Up Experiments on Hope Speech Detection

Neemesh Yadav, Mohammad Aflah Khan**, Diksha Sethi, Raghav Sahni • 2023

The First Tiny Papers Track at ICLR 2023

2022

Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization

Sarah Masud, Manjot Bedi, Mohammad Aflah Khan, Md Shad Akhtar, Tanmoy Chakraborty • 2022

KDD 2022 - Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Talks

[Talk] LLMs at Scale

Max Planck Computing and Data Facility: MPCDF (AI Kick-off Workshop) • April, 2025

Max Planck Institute for the Science of Light (Hosted by Florian Marquardt) • April, 2025

[Talk] An Overview of DeepSeek-{V3/R1}

Max Planck Institute for Software Systems: MPI-SWS (Part of AI, Computing & Society Initiative) • February, 2025

[Talk] Democratizing and Accelerating Research with LLMs: Making Science More Accessible Whilst Finding Interesting Research Problems

Max Planck Institute for Security and Privacy: MPI-SP (Hosted by Meeyoung Cha) • December, 2024

[Demo + Lightning Talk] Empowering Research with Open-Access LLMs: From Tools to Copilots

AI, Computing & Society Initiative Launch Event (At Max Planck Institute for Software Systems: MPI-SWS) • December, 2024

[Paper Reading & Discussion] Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps

Max Planck Institute for Software Systems: MPI-SWS (Internal Paper Reading Group) • July, 2024

[Paper Reading & Discussion] Deduplicating Training Data Makes Language Models Better

Max Planck Institute for Software Systems: MPI-SWS (Internal Paper Reading Group) • May, 2024

[Talk] Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

Max Planck Institute for Software Systems: MPI-SWS (Hosted by Krishna Gummadi) • July, 2023

Goldman Sachs (Internal NLP/IR Reading Group) • June, 2023

Organizing, Reviewing & Volunteering

The First Workshop on Large Language Model Memorization (L2M2) @ ACL'25

Program Committee Member • 2025

ACL Rolling Review (ARR), International Conference on Computational Linguistics (COLING), Workshop on Online Abuse and Harms (WOAH), The Technical Symposium on Computer Science Education (SIGCSE TS)

Max Planck Institute for Software Systems (MPI-SWS)

Research Software Engineer • April, 2024 — Present [Full Time] | Nov, 2023 — March, 2024 [Part time] | Aug, 2023 — Oct, 2023 [Intern]

EleutherAI

Open Source Contributor • Dec, 2022 — Present

Laboratory for Computational Social Systems (LCS2)

Undergraduate Student Researcher • June, 2021 — May, 2024

Goldman Sachs

Summer Analyst • May, 2023 — July, 2023

Google Summer of Code - TensorFlow

Open Source Developer • May, 2022 — Sept, 2022

USVSN Sai Prashanth, Alvin Deng, Kyle O'Brien, Jyothir S V, Mohammad Aflah Khan, Jaydeep Borkar, Christopher A. Choquette-Choo, Jacob Ray Fuehne, Stella Biderman, Tracy Ke, Katherine Lee, Naomi Saphra • 2025

Qinyuan Wu, Mohammad Aflah Khan, Soumi Das, Vedant Nanda, Bishwamittra Ghosh, Camila Kolling, Till Speicher, Laurent Bindschaedler, Krishna P Gummadi, Evimaria Terzi • 2025

Mohammad Aflah Khan*, Neemesh Yadav*, Sarah Masud, Md Shad Akhtar • 2025

In Agents We Trust, but Who Do Agents Trust? Latent Source Preferences Steer LLM Generations

Mohammad Aflah Khan, Mahsa Amani, Soumi Das, Bishwamittra Ghosh, Qinyuan Wu, Krishna P. Gummadi, Manish Gupta, Abhilasha Ravichander •

Rethinking Memorization Measures in LLMs: Recollection vs. Counterfactual vs. Contextual Memorization

Bishwamittra Ghosh, Soumi Das, Qinyuan Wu, Mohammad Aflah Khan, Krishna P. Gummadi, Evimaria Terzi, Deepak Garg • 2025

Rote Learning Considered Useful: Generalizing over Memorized Data in LLMs

Qinyuan Wu, Soumi Das, Mahsa Amani, Bishwamittra Ghosh, Mohammad Aflah Khan, Krishna P. Gummadi, Muhammad Bilal Zafar • 2025

In Agents We Trust, but Who Do Agents Trust? Latent Source Preferences Steer LLM Generations

Mohammad Aflah Khan, Mahsa Amani, Soumi Das, Bishwamittra Ghosh, Qinyuan Wu, Krishna P. Gummadi, Manish Gupta, Abhilasha Ravichander

Soumi Das, Camila Kolling, Mohammad Aflah Khan, Mahsa Amani, Bishwamittra Ghosh, Qinyuan Wu, Till Speicher, Krishna P. Gummadi

Till Speicher, Mohammad Aflah Khan, Qinyuan Wu, Vedant Nanda, Soumi Das, Bishwamittra Ghosh, Krishna P. Gummadi, Evimaria Terzi

Learning without Memorization Considered Infeasible: Rethinking Memorization Measures in LLMs

Bishwamittra Ghosh, Soumi Das, Qinyuan Wu, Mohammad Aflah Khan, Krishna P. Gummadi, Evimaria Terzi, Deepak Garg

Mohammad Aflah Khan*, Neemesh Yadav*, Diksha Sethi*, Raghav Sahni* • 2024

Sarah Masud*, Mohammad Aflah Khan*, Vikram Goyal, Md Shad Akhtar, Tanmoy Chakraborty • 2024

Shrey Satapara, Sarah Masud, Hiren Madhu, Mohammad Aflah Khan, Md Shad Akhtar, Tanmoy Chakraborty, Sandip Modha, Thomas Mandl • 2023

Sarah Masud, Mohammad Aflah Khan, Md. Shad Akhtar, Tanmoy Chakraborty • 2023

Stella Biderman, Hailey Schoelkopf, Quentin Gregory Anthony, Herbie Bradley, Kyle O’Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Edward Raff, Aviya Skowron, Lintang Sutawika, Oskar van der Wal • 2023

Mohammad Aflah Khan*, Neemesh Yadav*, Mohit Jain, Sanyam Goyal • 2023

Neemesh Yadav*, Mohammad Aflah Khan*, Diksha Sethi, Raghav Sahni • 2023

Sarah Masud, Manjot Bedi, Mohammad Aflah Khan, Md Shad Akhtar, Tanmoy Chakraborty • 2022

[Talk] LLMs at Scale

Max Planck Computing and Data Facility: MPCDF (AI Kick-off Workshop) • April, 2025

Max Planck Institute for the Science of Light (Hosted by Florian Marquardt) • April, 2025

Max Planck Institute for Software Systems: MPI-SWS (Part of AI, Computing & Society Initiative) • February, 2025

[Talk] Democratizing and Accelerating Research with LLMs: Making Science More Accessible Whilst Finding Interesting Research Problems

Max Planck Institute for Security and Privacy: MPI-SP (Hosted by Meeyoung Cha) • December, 2024

[Demo + Lightning Talk] Empowering Research with Open-Access LLMs: From Tools to Copilots

AI, Computing & Society Initiative Launch Event (At Max Planck Institute for Software Systems: MPI-SWS) • December, 2024

Max Planck Institute for Software Systems: MPI-SWS (Internal Paper Reading Group) • July, 2024

Max Planck Institute for Software Systems: MPI-SWS (Internal Paper Reading Group) • May, 2024

[Talk] Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

Max Planck Institute for Software Systems: MPI-SWS (Hosted by Krishna Gummadi) • July, 2023

Goldman Sachs (Internal NLP/IR Reading Group) • June, 2023

Program Committee Member • 2025

ACL Rolling Review (ARR), International Conference on Computational Linguistics (COLING), Workshop on Online Abuse and Harms (WOAH), The Technical Symposium on Computer Science Education (SIGCSE TS)

Served as a reviewer for the above mentioned conferences • 2023 Onwards

Organizer • 2023

Volunteer • 2022

Indraprastha Institute of Information Technology (IIIT-D)

B.Tech. in Computer Science and Engineering • 2020 — 2024

Lal Bahadur Shastri School

Senior-Secondary Education (12th Grade) • 2020

Banyan Tree School

Secondary Education (10th Grade) • 2018

EleutherAI • September, 2023

Selected for Amazon ML Summer School

Amazon • 2022

All India Rank 491

JEE Mains Paper 2 • 2020

Top 0.66 Percentile

JEE Mains Paper 1 • 2020

All India Rank 130

Undegraduate Entrance Examination (UGEE) • 2020

*Mohammad Aflah Khan**, Neemesh Yadav*, Sarah Masud, Md Shad Akhtar • 2025

*Mohammad Aflah Khan**, Neemesh Yadav, Diksha Sethi, Raghav Sahni* • 2024

Sarah Masud, Mohammad Aflah Khan**, Vikram Goyal, Md Shad Akhtar, Tanmoy Chakraborty • 2024

*Mohammad Aflah Khan**, Neemesh Yadav*, Mohit Jain, Sanyam Goyal • 2023

Neemesh Yadav, Mohammad Aflah Khan**, Diksha Sethi, Raghav Sahni • 2023