Project Overview
I am currently working on a research project focused on understanding and designing efficient large language model (LLM) architectures for constrained and real-world deployment settings. The project adopts an architecture-first research approach, studying how transformer design choices influence representation quality, robustness, and efficiency under limited computational budgets.
My work involves training compact transformer models from scratch and systematically analyzing the effects of architectural constraints and model compression techniques. Rather than treating compression purely as an optimization step, the project uses it as an analytical tool to study information bottlenecks, representation degradation, and architectural redundancy within language models.
A key motivation of this research is improving the feasibility of on-device and edge deployment of LLMs, particularly for underrepresented and multilingual contexts. The project emphasizes reproducibility, interpretability, and methodological rigor, with the long-term goal of contributing principled insights into how modern language models encode and retain linguistic information under resource constraints.
Reading List & Research Papers
A collection of foundational papers and resources I am referencing for this project.
- Attention Is All You Need
- Language Models are Few-Shot Learners (GPT-3)
- Training Compute-Optimal Large Language Models (Chinchilla)
- A Mathematical Framework for Transformer Circuits
More resources and PDFs will be available soon LLM_research.