Score-Based Generative Classifiers

Roland S. Zimmermann
University of Tübingen & IMPRS-IS
Lukas Schott
University of Tübingen & IMPRS-IS
Yang Song
Stanford University
Benjamin A. Dunn
Norwegian University of Science and Technology
David A. Klindt
Norwegian University of Science and Technology

tl;dr: We evaluate score-based generative models as classifiers on CIFAR-10 and find that they yield good accuracy and likelihoods but no adversarial robustness.


Oct '21 Our paper was accepted at the Deep Generative Models and Downstream Applications Workshop> at NeurIPS 2021!
Oct '21 The pre-print is now available on arXiv.


Model Comparison. Previous approaches have demonstrated a trade-off between accuracy and likelihoods of generative classifiers on CIFAR-10. The black lines show current state-of-the-art discriminative (horizontal) and generative (vertical) models.

Competitive Classification Accuracies & Likelihoods

Our main result is that the latest advances in score-based generative modeling of natural images translate into generative classifiers which have highly competitive classification accuracies as well as likelihoods, as can be seen for our Score-Based Generative Classifier (SBGC) below.

Model approach Accuracy [%] ↓ NLL [bits/dim.] ↑
Invertible Network (Mackowiak et al.) 67.30 4.34
GLOW (Fetaya et al.) 84.00 3.53
Normalizing flow (Ardizzone et al.) 89.73 5.25
Energy model (Grathwohl et al.) 92.90 N/A
SBGC (ours) 95.04 3.11
WideResNet-28-12 (Targ et al.) 95.42 N/A
ViT-H/14 (Dosovitskiy et al.) 99.50 N/A
VDM (Kingma et al.) N/A 2.49
Model Comparison Accuracy (in %) and negative log-likelihoods (NLL) in bits per dimension on the CIFAR-10 test set. The lower half of the table shows a baseline discriminative model, and the current state-of-the-art discriminative and generative models.

Improvements to Out-of-Domain robustness?

Most previous approaches towards improving out-of-distribution performance on common image corruptions have relied on some form of data augmentation in the form of carefully hand-crafted transformations or adversarial training. Here, we propose an orthogonal approach that is based on different modeling assumptions. Specifically, we are interested in seeing whether the inductive bias implicit in score-based generative classifiers improves their classification accuracy when generalizing to the image domains of CIFAR-10-C. We find that our SBGC model performs partially better than previous models that are trained with the same, very weak data augmentations.

CIFAR-10-C ↑
Models w/ data augm. CIFAR-10 ↑ all corruptions w/o noises
ResNeXt29 + AugMix (Hendrycks et al.) 95.83 89.09 90.51
ResNet-50 + adv. augm (Calian et al.) 94.93 92.17 92.53
Models w/ simple data augm.
WideResNet-28-12 (Targ et al.) 95.42 74.72 80.21
SBGC (ours) 95.04 76.24 75.71
Performance on Common Image Corruptions. Accuracy (in %) on clean CIFAR-10 test set and the mean accuracy on CIFAR-10-C (considering a random subset with 10% of the original size for SBGC because of computational limitations). We refer to random flips, random crops and uniform noise as simple augmentations.

While our approach partially improves robustness against common image corruptions, it strikingly fails in the presence of adversarial perturbations. Within both standard ℓ perturbation norms (8/255) as well as standard ℓ2 perturbation norms (0.5), our model miss-classifies every adversarially perturbed input image.

Model Clean ↑ 2
SBGC (ours) 95.04 0.00 0.00
WideResNet-70-16 + augm. (Rebuffi et al.) 92.23 82.32 66.56
Performance on Adversarial Perturbations. Accuracy (in %) of different models against adversarial perturbations generated by a norm-bounded ℓ and ℓ2 PGD attack (against each specific model) with a bound of ε=8/255 and ε=0.5, respectively. The bottom row shows the current state-of-the-art model for adversarially robust classification on CIFAR-10 as a reference value.

Acknowledgements & Funding

We thank Matthias Bethge, Wieland Brendel, Will Grathwohl, Zahra Kadkhodaie, Dylan Paiton, Ben Poole, Yash Sharma and Eero Simoncelli.
Further, we thank the International Max Planck Research School for Intelligent Systems (IMPRS-IS) for supporting RSZ and LS. This work was partially supported by a Research Council of Norway FRIPRO grant (90532703) and by the German Federal Ministry of Education and Research (BMBF) through the Competence Center for Machine Learning (TUE.AI, FKZ 01IS18039A).


When citing our project, please use our pre-print:

  author = {
    Zimmermann, Roland S. and
    Schott, Lukas and
    Song, Yang and
    Dunn, Benjamin A. and
    Wallis, Thomas S. A., and
    Klindt, David A.
  title = {
    Score-Based Generative Classifiers
  journal = {CoRR},
  volume = {abs/2110.00473},
  year = {2021},
Webpage designed using Bootstrap 4.5.