Mohammed Safi Ur Rahman Khan

AI4Bharat Lab

Indian Institute of Technology, Madras

Chennai, Tamil Nadu, India

I am Safi (صفی), a second year PhD Student at the Wadhwani School of Data Science and Artificial Intelligence (DSAI) @ IIT Madras & AI4Bharat Lab where I am advised by Prof. Mitesh M. Khapra. My current research focuses on Resources and Evaluation of Multilingual Large ‘X’ Language Models (where X = ‘ ‘ or ‘Vision’ or ‘Audio’). I am also currently a Research Fellow at Sarvam, helping in building Sovereign AI for India!

Previously, I was an AI Resident at the AI4Bharat Lab at IIT Madras, where I was fortunate to be part of the IndicLLMSuite (guided by Prof. Mitesh M. Khapra). I did my M.Tech in Computer Science and Engineering before that from IIT Madras (again!!) where I got to work on “Narrow Domain Adaptation of Speech Recognition Systems” guided by Prof. Pratyush Kumar and (you guessed it!) Prof. Mitesh M. Khapra.

news

Jun 29, 2025	Will be attending ACL-2025 in Vienna Insha’Allah!! Will be co-presenting - FairITales, CIA, FERMAT, and BhasaAnuvaad. Please catch us up at our posters to know more about our work.
Jun 01, 2025	I’ll be joining Sarvam AI as a Research Fellow to help with the Sovereign LLM efforts. Will be working with the Alignment and Evaluations team to build an India-first 🇮🇳 model!!
May 25, 2025	4 papers accepted to ACL 2025 Alhamdulillah!! CIA, FERMAT, BhasaAnuvaad, and FairITales (preprint out soon). Vienna hope to see you soon Insha Allah.
Apr 13, 2025	I’ll be in Singapore 🇸🇬 for ICLR 2025. Looking forward to making new connections!
Nov 14, 2024	FBI wins 🏆 Outstanding paper too!! Alhamdulillah.
Nov 07, 2024	I’ll be in Miami 🇺🇸 for EMNLP 2024 to present FBI. Thank you Google for sponsoring this. Looking forward to connect with y’all!!

latest posts

Mar 14, 2024	Indic LLM Suite \| AI4Bharat Blog

selected publications

ACL 2025

FairI Tales: Evaluation of Fairness in Indian Contexts with a Focus on Bias and Stereotypes

Janki Atul Nawale^*, Mohammed Safi Ur Rahman Khan^*, Janani D, Mansi Gupta, Danish Pruthi, and Mitesh M. Khapra

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

Bib PDF Code Data

@article{nawale2025fairi,
  title = {FairI Tales: Evaluation of Fairness in Indian Contexts with a Focus on Bias and Stereotypes},
  author = {Nawale, Janki Atul and Khan, Mohammed Safi Ur Rahman and D, Janani and Gupta, Mansi and Pruthi, Danish and Khapra, Mitesh M.},
  year = {2025},
  journal = {Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  data = {https://huggingface.co/datasets/ai4bharat/Indic-Bias},
}

NAACL 2025

MILU: A Multi-task Indic Language Understanding Benchmark

Sshubam Verma, Mohammed Safi Ur Rahman Khan, Vishwajeet Kumar, Rudra Murthy, and Jaydeep Sen

Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics, 2024

Bib PDF Code Data

@article{verma2024milu,
  title = {MILU: A Multi-task Indic Language Understanding Benchmark},
  author = {Verma, Sshubam and Khan, Mohammed Safi Ur Rahman and Kumar, Vishwajeet and Murthy, Rudra and Sen, Jaydeep},
  year = {2024},
  journal = {Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics},
  data = {https://huggingface.co/datasets/ai4bharat/MILU},
}

ACL 2025

Cross-Lingual Auto Evaluation for Assessing Multilingual LLMs

Sumanth Doddapaneni^*, Mohammed Safi Ur Rahman Khan^*, Dilip Venkatesh, Raj Dabre, Anoop Kunchukuttan, and Mitesh M. Khapra

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Bib PDF Code Data

@article{doddapaneni2024crosslingual,
  title = {Cross-Lingual Auto Evaluation for Assessing Multilingual LLMs},
  author = {Doddapaneni, Sumanth and Khan, Mohammed Safi Ur Rahman and Venkatesh, Dilip and Dabre, Raj and Kunchukuttan, Anoop and Khapra, Mitesh M.},
  year = {2024},
  journal = {Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  data = {https://huggingface.co/collections/ai4bharat/cia-suite-66ea9a7e18a6c70bd8de27a1},
}

EMNLP 2024

Finding Blind Spots in Evaluator LLMs with Interpretable Checklists

Sumanth Doddapaneni^*, Mohammed Safi Ur Rahman Khan^*, Sshubam Verma, and Mitesh M. Khapra

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Nov 2024

Outstanding Paper Bib PDF Code Data

EMNLP-2024 Outstanding Paper Award

@article{doddapaneni2024finding,
  title = {Finding Blind Spots in Evaluator LLMs with Interpretable Checklists},
  author = {Doddapaneni, Sumanth and Khan, Mohammed Safi Ur Rahman and Verma, Sshubam and Khapra, Mitesh M.},
  year = {2024},
  journal = {Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing},
  month = nov,
  address = {Miami, Florida, USA},
  publisher = {Association for Computational Linguistics},
  pages = {16279--16309},
  data = {https://huggingface.co/datasets/ai4bharat/FBI},
}

ACL 2024

IndicLLMSuite: A Blueprint for Creating Pre-training and Fine-Tuning Datasets for Indian Languages

Mohammed Safi Ur Rahman Khan^*, Priyam Mehta^*, Ananth Sankar, Umashankar Kumaravelan, Sumanth Doddapaneni, Suriyaprasaad G, Varun Balan G, Sparsh Jain, Anoop Kunchukuttan, Pratyush Kumar, Raj Dabre, and Mitesh M. Khapra

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Aug 2024

Outstanding Paper Bib PDF Code Data

ACL-2024 Outstanding Paper Award

@article{khan2024indicllmsuite,
  title = {IndicLLMSuite: A Blueprint for Creating Pre-training and Fine-Tuning Datasets for Indian Languages},
  author = {Khan, Mohammed Safi Ur Rahman and Mehta, Priyam and Sankar, Ananth and Kumaravelan, Umashankar and Doddapaneni, Sumanth and G, Suriyaprasaad and G, Varun Balan and Jain, Sparsh and Kunchukuttan, Anoop and Kumar, Pratyush and Dabre, Raj and Khapra, Mitesh M.},
  year = {2024},
  journal = {Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  month = aug,
  address = {Bangkok, Thailand},
  publisher = {Association for Computational Linguistics},
  pages = {15831--15879},
  data = {https://huggingface.co/collections/ai4bharat/indicllmsuite-65ee7d225c337fcfa0991707},
}