Andrei Panferov

ML Research Scientist

Yandex Research

Biography

I’m a research scientist at Yandex Research. My research interests include natural language processing, efficient deep learning and federated learning. I’m currently a final year bachelor’s student at Moscow Institute of Physics and Technology (MIPT).

Interests

Natural Language Processing
Efficient Deep Learning
Federated Learning

Education

PhD in Computer Science, 2024-now
ISTA
BSc in Applied Mathematics and Physics, 2020-2024
Moscow Institute of Physics and Technology
Machine Learning Engineer, 2021-2023
Yandex School of Data Analysis

Experience

Senior ML Engineer (NLP)

Wildberries

April 2024 – Present Remote

Overseeing the LLM deployment

ML Research Scientist

Yandex Research

November 2023 – February 2024 Moscow, Russia

Wrote a first-author paper on LLM Compression
Achieved state-of-the-art results on LLM compression, reducing model size by 87% with acceptable loss in performance
Wrote efficient inference kernels using Triton and C++, speeding up LLM inference by up to 320%
Integrated the framework into the transformers library, enabling low RAM dispatch and reducing instance RAM requirements by 70%

Research Intern

KAUST, Optimization and Machine Learning Lab

July 2023 – September 2023 Saudi Arabia

Conducted research under the supervision of Prof. Peter Richtárik.
Authored a first-author paper on Correlated Quantization

Infrastructure Developer

Eqvilent (HFT Fund)

July 2022 – March 2023 Remote

ML Engineer Intern (NLP)

Yandex

March 2022 – July 2022 Moscow, Russia

Enabled abstract tabular data insertion for efficient map-reduce LLM inference, speeding up the tabular data processing by 120%
Increased test coverage of the map-reduce inference interface from 0 to 85% through rigorous unit testing

Researcher

Terra Quantum AG

July 2020 – July 2022 Moscow, Russia

Researched quantum algorithms for business applications.
Developed an NMR spectra analysis toll, allowing for its use for for quantum computations.
Optimized LLM deployment for chat assistant applications, reducing latency by 40%.