Synthetic Designed Experiments for Diagnosing Vision Model Failures

Krisanu Sarkar
Indian Institute of Technology Bombay
Under Review at CVPR SynData4CV 2026

Abstract

Current synthetic data pipelines for computer vision generate images without diagnosing what the downstream model actually needs. We propose Synthetic Designed Experiments for Representational Sufficiency (SDRS), a principled framework based on the statistical theory of Design of Experiments (DoE). SDRS treats the downstream model as a black-box system and the synthetic generator as an experimental apparatus. Using fractional factorial designs, SDRS efficiently audits a model's factor-sensitivity profile via ANOVA decomposition, identifying coverage failures (Type I gaps) and spurious dependencies (Type II gaps).

Theoretical Framework: ANOVA Decomposition

SDRS leverages the Analysis of Variance (ANOVA) to decompose the model's response (e.g., loss or accuracy) into contributions from individual scene factors and their interactions. For a set of factors \(\{F_1, F_2, \dots, F_n\}\), the total variance in model performance is partitioned as:

\[ SS_{\text{total}} = \sum_{i} SS_{F_i} + \sum_{i < j} SS_{F_i \times F_j} + SS_{\text{error}} \]

A high F-statistic for a specific factor indicates that the model is highly sensitive to that factor, revealing potential representational gaps or biases.

Experiment 1: Diagnostic on dSprites

We planted specific biases in a dSprites-based dataset to test if SDRS could detect them. The audit correctly identified both gap types, and targeted data improved accuracy significantly.

dSprites Experiment Results
Figure 1: ANOVA Audit on dSprites. The F-statistics reveal high sensitivity to shape and orientation before correction, which is mitigated after targeted synthetic data intervention.

Accuracy Comparison

Condition Accuracy
No Synthetic Data (Baseline)47.4%
Random Synthetic Data53.8%
Domain Randomization53.5%
SDRS (Targeted)79.0%

Experiment 2: Dense Segmentation

In a procedural scene segmentation task, SDRS detected background-complexity shortcuts that limited model generalization.

Segmentation Experiment Results
Figure 2: Segmentation Audit. The audit identifies background complexity as a major factor influencing model performance (Type II gap).

mIoU Performance

Method mIoU
Baseline0.332
Random Sampling0.976
SDRS (Targeted)0.998

Experiment 3: Entanglement Detection

SDRS can also be used to audit the generator itself, identifying cross-factor contamination in imperfect synthetic pipelines.

Entanglement Detection Results
Figure 3: Entanglement Audit. The ANOVA decomposition identifies "leaked" factors where the generator fails to maintain independent control over scene parameters.

Conclusion

SDRS transforms synthetic data generation from a "hit-or-miss" random process into a principled diagnostic tool. By applying Design of Experiments to vision models, we can systematically identify and fix representational failures, leading to more robust and reliable AI systems.