Multi-level Logit Distillation

January 2023

Overall framework

Abstract

Knowledge Distillation (KD) aims at distilling the knowledge from the large teacher model to a lightweight student model. Mainstream KD methods can be divided into two categories, logit distillation, and feature distillation. The former is easy to implement, but inferior in performance, while the latter is not applicable to some practical circumstances due to concerns such as privacy and safety. Towards this dilemma, in this paper, we explore a stronger logit distillation method via making better utilization of logit outputs. Concretely, we propose a simple yet effective approach to logit distillation via multi-level prediction alignment. Through this framework, the prediction alignment is not only conducted at the instance level, but also at the batch and class level, through which the student model learns instance prediction, input correlation, and category correlation simultaneously. In addition, a prediction augmentation mechanism based on model calibration further boosts the performance. Extensive experiment results validate that our method enjoys consistently higher performance than previous logit distillation methods, and even reaches competitive performance with mainstream feature distillation methods. We promise to release our code and models to ensure reproducibility.

Type

Conference paper

Publication

In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023

Jiaqi Wang 王佳琦

Research Director
JD Explore Academy

Jiaqi Wang is currently a Research Director at JD Explore Academy, leading the research and development of large language models (LLMs) and vision-language models (VLMs). Previously, he was a Research Scientist at Shanghai AI Laboratory. He also serves as an Adjunct Ph.D. Supervisor at Shanghai Innovation Institute.

Multi-level Logit Distillation

Abstract

Jiaqi Wang 王佳琦

Research DirectorJD Explore Academy

Research Director
JD Explore Academy