Title: The Quest for Super-Efficiency: Workflow and Research Status of Large-Scale Model Quantization
Speaker: Yuma Ichikawa (Principal Researcher, Fujitsu Laboratories Ltd.; Project Researcher, RIKEN AI Research Center)
Abstract: In recent years, as Large Language Models (LLMs) have grown in scale, challenges such as reduced inference speed, increased memory usage, and rising power costs have become apparent. To address these issues, quantization techniques that discretize real-number representations to improve computational efficiency have garnered attention. However, achieving ultra-low-bit quantization in LLMs has been reported to be difficult. We tackled this challenge and succeeded in maintaining an average of 90% performance on standard benchmarks even with 1-bit ultra-low-bit quantization. This presentation systematically overviews the LLM quantization workflow that enabled this achievement and the latest research trends.
- Date
-
27th March, 2026(Fri,)18:00~20:00
- Venue
-
Held online
- Organizer
-
Co-organizer (HRAM The Japan Society for Industrial and Applied Mathematics, D-DRIVE National Network)
- Participation Fee
-
Free(Advance registration required)
https://www-mmds.sigmath.es.osaka-u.ac.jp/structure/activity/ai_data.php?id=106
- web
-
https://www-mmds.sigmath.es.osaka-u.ac.jp/structure/activity/ai_data.php?id=106
- Contact
-
Takashi Suzuki
suzuki@sigmath.es.osaka-u.ac.jp