Papers

The surprising strength of weak classifiers for validating neural posterior estimates

V Bansal, T Chen, JG Scott - arXiv preprint arXiv:2507.17026, 2025 - arxiv.org
Statistics paper stat.ML Suggest

… for validating neural posterior estimates, establishing the conformal C2ST as a practical, theoretically grounded diagnostic for modern simulation-based inference. …

Link to paper

BibTeX

@article{2507.17026v1,
Author = {Vansh Bansal and Tianyu Chen and James G. Scott},
Title = {The surprising strength of weak classifiers for validating neural
posterior estimates},
Eprint = {2507.17026v1},
ArchivePrefix = {arXiv},
PrimaryClass = {stat.ML},
Abstract = {Neural Posterior Estimation (NPE) has emerged as a powerful approach for
amortized Bayesian inference when the true posterior $p(\theta \mid y)$ is
intractable or difficult to sample. But evaluating the accuracy of neural
posterior estimates remains challenging, with existing methods suffering from
major limitations. One appealing and widely used method is the classifier
two-sample test (C2ST), where a classifier is trained to distinguish samples
from the true posterior $p(\theta \mid y)$ versus the learned NPE approximation
$q(\theta \mid y)$. Yet despite the appealing simplicity of the C2ST, its
theoretical and practical reliability depend upon having access to a
near-Bayes-optimal classifier -- a requirement that is rarely met and, at best,
difficult to verify. Thus a major open question is: can a weak classifier still
be useful for neural posterior validation? We show that the answer is yes.
Building on the work of Hu and Lei, we present several key results for a
conformal variant of the C2ST, which converts any trained classifier's scores
-- even those of weak or over-fitted models -- into exact finite-sample
p-values. We establish two key theoretical properties of the conformal C2ST:
(i) finite-sample Type-I error control, and (ii) non-trivial power that
degrades gently in tandem with the error of the trained classifier. The upshot
is that even weak, biased, or overfit classifiers can still yield powerful and
reliable tests. Empirically, the Conformal C2ST outperforms classical
discriminative tests across a wide range of benchmarks. These results reveal
the under appreciated strength of weak classifiers for validating neural
posterior estimates, establishing the conformal C2ST as a practical,
theoretically grounded diagnostic for modern simulation-based inference.},
Year = {2025},
Month = {Jul},
Url = {http://arxiv.org/abs/2507.17026v1},
File = {2507.17026v1.pdf}
}

Share