ALDEN: Dual-Level Disentanglement with Meta-learning for Generalizable Audio Deepfake Detection

Shenzhen University, Politecnico di Milano, Afirstsoft Technology Group Co.
ACM MM 2025

*Corresponding author
ALDEN Concept Illustration

An illustration of the ALDEN framework, structured along two key axes: low-level signal disentanglement (vertical) and high-level semantic disentanglement (horizontal). ALDEN incorporates dual-level disentangled learning (scissors) and meta-learning (recycling) to improve generalization across different vocoders. By focusing on vocoder-agnostic features and synthetic-relevant cues, ALDEN enhances the model's generalization ability while minimizing sensitivity to irrelevant variations.

Algorithm

ALDEN Algorithm Pseudocode

Algorithm 1: The Proposed ALDEN Framework

Cross-vocoder and In-the-wild Scenarios

Detailed Results on Different Datasets

BibTeX

If you find our work useful, please consider citing:

@inproceedings{xu2025alden,
  title={ALDEN: Dual-Level Disentanglement with Meta-learning for Generalizable Audio Deepfake Detection},
  author={Yuxiong Xu and Bin Li and Weixiang Li and Sara Mandelli and Viola Negroni and Sheng Li},
  conference={Proceedings of the 33rd ACM International Conference on Multimedia (ACM MM)},
  year={2025}
}