Journal of Radio Electronics. eISSN 1684-1719. 2025. ¹11
Full text in Russian (pdf)
DOI: https://doi.org/10.30898/1684-1719.2025.11.4
INTERPRETABLE ACOUSTIC CLASSIFICATION
OF SMALL-SIZED AIRCRAFT
USING THE GRAD-CAM METHOD
E.I. Minakov1, V.N. Gridin2, D.G. Andronichev1,
V.I. Solodovnikov2, A.A. Sychugov1, Yu.V. Frantsuzova1
1Tula State University,
300012, Russia, Tula, Lenina ave., 922Design Information Technologies Center Russian Academy of Sciences,
143003, Russia, Odintsovo, Marshal Biryuzova str., 7A
The paper was received October 6, 2025.
Abstract. The rapid growth of small unmanned aerial vehicles (UAVs) creates new challenges for timely detection and reliable classification, especially in security-sensitive environments. Traditional acoustic analysis methods, usually based on spectral features, often provide limited accuracy and are highly sensitive to environmental noise. Deep learning approaches, particularly convolutional neural networks (CNNs), have demonstrated clear improvements, but their “black-box” nature reduces trust in predictions and complicates deployment in practice.
The goal of this study is to develop an interpretable acoustic classification model for small UAVs that combines high recognition accuracy with transparent decision-making. The proposed solution employs a CNN architecture optimized using the Keras Tuner framework, which automatically searches over hyperparameters such as convolutional kernel size, the number of filters, the size of the dense layer, regularization coefficients, and learning rate. The model processes Mel-frequency cepstral coefficients (MFCCs) extracted from UAV recordings, using fixed input dimensions of (40, 173, 1) to represent both spectral and temporal information. The optimized network includes four convolutional blocks with 16, 32, 192, and 128 filters, each followed by batch normalization, ReLU activation, and max pooling. A global average pooling layer reduces feature dimensionality before a dense layer of 64 neurons with dropout regularization, and finally a SoftMax classification layer outputs nine UAV classes. To improve interpretability, Gradient-weighted Class Activation Mapping (Grad-CAM) is integrated into the analysis pipeline, enabling visualization of the most relevant time-frequency regions that support classification decisions.
Experimental evaluation shows that the model achieves 97 % accuracy (95 % CI: 94-99 %), a macro-F1 score of 0.98, and ROC-AUC of 0.998, while remaining robust to acoustic noise down to 0 dB SNR. Grad-CAM visualizations demonstrate that the network consistently highlights physically meaningful spectral regions – such as fundamental rotor frequencies and their harmonics – while ignoring irrelevant noise. This confirms that the network’s high performance is based on real acoustic patterns characteristic of UAVs rather than spurious artifacts.
From a practical perspective, the system can support airspace monitoring, perimeter security, and protection of critical infrastructure by providing not only accurate predictions but also visual explanations that are understandable to human operators. This combination increases user trust and facilitates real-time decision-making in operational environments.
The results indicate that incorporating interpretable deep learning methods such as Grad-CAM into UAV acoustic classification systems can significantly enhance both accuracy and transparency. Future work will focus on further improving noise robustness, broadening the range of UAV types considered, and exploring complementary interpretability methods such as Integrated Gradients and SHAP.
Key words: acoustic classification, convolutional neural networks, MFCC features, Grad-CAM, activation visualization, deep learning, spectral analysis, audio signal processing, model interpretability.
Financing: The work was carried out with the financial support of a Grant in the form of a subsidy for scientific research from the Committee for Science and Innovation of the Tula Region, No. 15, dated June 21, 2024.
Corresponding author: Evgeny Ivanivich Minakov, EMinakov@bk.ru
References
1. Rakshit H., Bagheri Zadeh P. A New Approach to Classify Drones Using a Deep Convolutional Neural Network //Drones. – 2024. – Ò. 8. – ¹. 7. – Ñ. 319. https://doi.org/10.3390/drones8070319.
2. Berg A. P., Zhang Q., Wang M. Y. 4,500 Seconds: Small Data Training Approaches for Deep UAV Audio Classification //arXiv preprint arXiv:2505.23782. – 2025. https://doi.org/10.5220/0013462400003967.
3. Berg A. P., Zhang Q., Wang M. Y. 15,500 Seconds: Lean UAV Classification Leveraging PEFT and Pre-Trained Networks //arXiv preprint arXiv:2506.11049. – 2025.
4. Yi W., Choi J. W., Lee J. W. Sound-based drone fault classification using multitask learning //arXiv preprint arXiv:2304.11708. – 2023.
5. Paszkowski W., Gola A., Świć A. Acoustic-Based Drone Detection Using Neural Networks–A Comprehensive Analysis //Advances in Science and Technology Research Journal. – 2024. – Ò. 18. – ¹. 1. – Ñ. 36-47. https://doi.org/10.12913/22998624/175863.
6. Seidaliyeva U. et al. Advances and challenges in drone detection and classification techniques: A state-of-the-art review //Sensors. – 2023. – Ò. 24. – ¹. 1. – Ñ. 125. https://doi.org/10.3390/s24010125.
7. Rahman M. H. et al. A comprehensive survey of unmanned aerial vehicles detection and classification using machine learning approach: Challenges, solutions, and future directions //Remote Sensing. – 2024. – Ò. 16. – ¹. 5. – Ñ. 879. https://doi.org/10.3390/rs16050879.
8. Ni J., Zhou Z. Blind source separation and unmanned aerial vehicle classification using CNN with hybrid cross-channel and spatial attention module //Scientific Reports. – 2025. – Ò. 15. – ¹. 1. – Ñ. 21905.
9. Zaman B., Al-Dulaimi A., Hussain M., Nguyen Q., Han S., Zhang Y. Audio Spectrogram Transformer with Convolutional Frontend for UAV Sound Classification in Low-Data Regime // Applied Sciences. – 2025. – Vol. 15, ¹ 7. – Art. 1234.
10. Semenyuk S. Modern Technologies for Drone Detection and Classification: 2020–2025 Overview // Sensors. – 2025. – Vol. 25, ¹ 3. – Art. 678.
11. Zeiler M. D., Fergus R. Visualizing and understanding convolutional networks //European conference on computer vision. – Cham : Springer International Publishing, 2014. – Ñ. 818-833.https://doi.org/10.1007/978-3-319-10590-1_53
12. Zhou B. et al. Learning deep features for discriminative localization //Proceedings of the IEEE conference on computer vision and pattern recognition. – 2016. – Ñ. 2921-2929. https://doi.org/10.1109/CVPR.2016.319.
13. Selvaraju R. R. et al. Grad-cam: Visual explanations from deep networks via gradient-based localization //Proceedings of the IEEE international conference on computer vision. – 2017. – Ñ. 618-626.. https://doi.org/10.1109/ICCV.2017.74
14. Chattopadhay A. et al. Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks //2018 IEEE winter conference on applications of computer vision (WACV). – IEEE, 2018. – Ñ. 839-847. https://doi.org/10.1109/WACV.2018.00097.
For citation:
Minakov E.I., Gridin V.N., Andronichev D.G., Solodovnikov V.I., Sychugov A.A., Frantsuzova Yu.V. Interpretable acoustic classification of small-sized aircraft using the Grad-CAM method. // Journal of Radio Electronics. – 2025. – ¹. 11. https://doi.org/10.30898/1684-1719.2025.11.4 (In Russian)