Score-Based Generative Classifiers

Roland S. Zimmermann
University of Tübingen & IMPRS-IS

Lukas Schott
University of Tübingen & IMPRS-IS

Yang Song
Stanford University

Benjamin A. Dunn
Norwegian University of Science and Technology

David A. Klindt
Norwegian University of Science and Technology

Code (Soon)

tl;dr: We evaluate score-based generative models as classifiers on CIFAR-10 and find that they yield good accuracy and likelihoods but no adversarial robustness.

News

Oct '21	Our paper was accepted at the Deep Generative Models and Downstream Applications Workshop> at NeurIPS 2021!
Oct '21	The pre-print is now available on arXiv.

Abstract

The tremendous success of generative models in recent years raises the question whether they can also be used to perform classification. Generative models have been used as adversarially robust classifiers on simple datasets such as MNIST, but this robustness has not been observed on more complex datasets like CIFAR-10. Additionally, on natural image datasets, previous results have suggested a trade-off between the likelihood of the data and classification accuracy. In this work, we investigate score-based generative models as classifiers for natural images. We show that these models not only obtain competitive likelihood values but simultaneously achieve state-of-the-art classification accuracy for generative classifiers on CIFAR-10. Nevertheless, we find that these models are only slightly, if at all, more robust than discriminative baseline models on out-of-distribution tasks based on common image corruptions. Similarly and contrary to prior results, we find that score-based are prone to worst-case distribution shifts in the form of adversarial perturbations. Our work highlights that score-based generative models are closing the gap in classification accuracy compared to standard discriminative models. While they do not yet deliver on the promise of adversarial and out-of-domain robustness, they provide a different approach to classification that warrants further research.

Model Comparison. Previous approaches have demonstrated a trade-off between accuracy and likelihoods of generative classifiers on CIFAR-10. The black lines show current state-of-the-art discriminative (horizontal) and generative (vertical) models.

Competitive Classification Accuracies & Likelihoods

Our main result is that the latest advances in score-based generative modeling of natural images translate into generative classifiers which have highly competitive classification accuracies as well as likelihoods, as can be seen for our Score-Based Generative Classifier (SBGC) below.

Model approach	Accuracy [%] ↓	NLL [bits/dim.] ↑

Invertible Network (Mackowiak et al.)	67.30	4.34
GLOW (Fetaya et al.)	84.00	3.53
Normalizing flow (Ardizzone et al.)	89.73	5.25
Energy model (Grathwohl et al.)	92.90	N/A
SBGC (ours)	95.04	3.11

WideResNet-28-12 (Targ et al.)	95.42	N/A
ViT-H/14 (Dosovitskiy et al.)	99.50	N/A
VDM (Kingma et al.)	N/A	2.49

Model Comparison Accuracy (in %) and negative log-likelihoods (NLL) in bits per dimension on the CIFAR-10 test set. The lower half of the table shows a baseline discriminative model, and the current state-of-the-art discriminative and generative models.

Improvements to Out-of-Domain robustness?

Most previous approaches towards improving out-of-distribution performance on common image corruptions have relied on some form of data augmentation in the form of carefully hand-crafted transformations or adversarial training. Here, we propose an orthogonal approach that is based on different modeling assumptions. Specifically, we are interested in seeing whether the inductive bias implicit in score-based generative classifiers improves their classification accuracy when generalizing to the image domains of CIFAR-10-C. We find that our SBGC model performs partially better than previous models that are trained with the same, very weak data augmentations.

		CIFAR-10-C ↑
Models w/ data augm.	CIFAR-10 ↑	all corruptions	w/o noises

ResNeXt29 + AugMix (Hendrycks et al.)	95.83	89.09	90.51
ResNet-50 + adv. augm (Calian et al.)	94.93	92.17	92.53
Models w/ simple data augm.

WideResNet-28-12 (Targ et al.)	95.42	74.72	80.21
SBGC (ours)	95.04	76.24	75.71

Performance on Common Image Corruptions. Accuracy (in %) on clean CIFAR-10 test set and the mean accuracy on CIFAR-10-C (considering a random subset with 10% of the original size for SBGC because of computational limitations). We refer to random flips, random crops and uniform noise as simple augmentations.

While our approach partially improves robustness against common image corruptions, it strikingly fails in the presence of adversarial perturbations. Within both standard ℓ_∞ perturbation norms (8/255) as well as standard ℓ₂ perturbation norms (0.5), our model miss-classifies every adversarially perturbed input image.

Model	Clean ↑	ℓ_∞ ↑	ℓ₂ ↑

SBGC (ours)	95.04	0.00	0.00
WideResNet-70-16 + augm. (Rebuffi et al.)	92.23	82.32	66.56

Performance on Adversarial Perturbations. Accuracy (in %) of different models against adversarial perturbations generated by a norm-bounded ℓ_∞ and ℓ₂ PGD attack (against each specific model) with a bound of ε=8/255 and ε=0.5, respectively. The bottom row shows the current state-of-the-art model for adversarially robust classification on CIFAR-10 as a reference value.

Acknowledgements & Funding

We thank Matthias Bethge, Wieland Brendel, Will Grathwohl, Zahra Kadkhodaie, Dylan Paiton, Ben Poole, Yash Sharma and Eero Simoncelli.
Further, we thank the International Max Planck Research School for Intelligent Systems (IMPRS-IS) for supporting RSZ and LS. This work was partially supported by a Research Council of Norway FRIPRO grant (90532703) and by the German Federal Ministry of Education and Research (BMBF) through the Competence Center for Machine Learning (TUE.AI, FKZ 01IS18039A).

BibTeX

When citing our project, please use our pre-print:

@article{zimmermann2021score,
  author = {
    Zimmermann, Roland S. and
    Schott, Lukas and
    Song, Yang and
    Dunn, Benjamin A. and
    Wallis, Thomas S. A., and
    Klindt, David A.
  },
  title = {
    Score-Based Generative Classifiers
  },
  journal = {CoRR},
  volume = {abs/2110.00473},
  year = {2021},
}

Webpage designed using Bootstrap 4.5.