LUMINA: A Multi-Vendor Mammography Benchmark with Energy Harmonization Protocol

Hongyi Pan1†, Gorkem Durak1, Halil Ertugrul Aktas1, Andrea M. Bejar1, Baver Tutun2,
Emre Uysal2, Ezgi Bulbul2, Mehmet Fatih Dogan2, Berrin Erok2, Berna Akkus Yildirim2,
Sukru Mehmet Erturk3, Ulas Bagci1†
1Department of Radiology, Northwestern University, Chicago, IL, USA
2Department of Radiation Oncology, University of Health Sciences Prof. Dr. Cemil Tascioglu City Hospital, Istanbul, Turkey
3Department of Radiology, Istanbul University, Istanbul, Turkey

LUMINA introduces a multi-vendor FFDM benchmark consisting of 1,824 images across 6 vendors with pathology-confirmed labels. We propose a foreground-only energy harmonization approach that significantly reduces acquisition variability. Our benchmark demonstrates that harmonization consistently boosts performance across architectures, with EfficientNet-B0 reaching 93.54% AUC for breast cancer diagnosis.

Pipeline

Abstract

Publicly available full-field digital mammography (FFDM) datasets remain limited in size, clinical annotations, and vendor diversity, hindering the development of robust models. We introduce LUMINA, a curated, multi-vendor FFDM dataset that explicitly encodes acquisition energy and vendor metadata to capture clinically relevant appearance variations often overlooked in existing benchmarks. This dataset contains 1824 images from 468 patients (960 benign, 864 malignant), with pathology-confirmed labels, BI-RADS assessments, and breast-density annotations. LUMINA spans six acquisition systems and includes both high- and low-energy imaging styles, enabling systematic analysis of vendor- and energy-induced domain shifts. To address these variations, we propose a foreground-only pixel-space alignment method (''energy harmonization'') that maps images to a low-energy reference while preserving lesion morphology. We benchmark CNN and transformer models on three clinically relevant tasks: diagnosis (benign vs. malignant), BI-RADS classification, and density estimation. Two-view models consistently outperform single-view models. EfficientNet-B0 achieves an AUC of 93.54% for diagnosis, while Swin-T achieves the best macro-AUC of 89.43% for density prediction. Harmonization improves performance across architectures and produces more localized Grad-CAM responses. Overall, LUMINA provides (1) a vendor-diverse benchmark and (2) a model-agnostic harmonization framework for reliable and deployable mammography AI.

Dataset

Samples

comparision.png

distribution.png

Benchmark

Diagnosis

BIRADS

Density

Harmonization improves ACC/AUC/F1 and makes attention more focal

improvement

BibTeX Citation

@inproceedings{pan2026lumina,
  title={LUMINA: A Multi-Vendor Mammography Benchmark with Energy Harmonization Protocol},
  author={Pan, Hongyi and Durak, Gorkem and Aktas, Halil Ertugrul and Bejar, Andrea M and Tutun, Baver and Uysal, Emre and Bulbul, Ezgi and Dogan, Mehmet Fatih and Erok, Berrin and Yildirim, Berna Akkus and others},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  year={2026}
}