CAPSE-ViT: A Lightweight Framework for Underwater Acoustic Vessel Classification Using Coherent Spectral Estimation and Modified Vision Transformer

Najamuddin NAJAMUDDIN; Usman Ullah SHEIKH; Ahmad Zuri SHA’AMERI

doi:10.24425/aoa.2025.153662

Authors

Najamuddin NAJAMUDDIN Faculty of Electrical Engineering, Universiti Teknologi Malaysia, UTM Skudai, Malaysia
Usman Ullah SHEIKH Faculty of Electrical Engineering, Universiti Teknologi Malaysia, UTM Skudai, Malaysia
Ahmad Zuri SHA’AMERI Faculty of Electrical Engineering, Universiti Teknologi Malaysia, UTM Skudai, Malaysia

Abstract

Underwater acoustic target classification has become a key area of research for marine vessel classification, where machine learning (ML) models are leveraged to identify targets automatically. The major challenge is inserting area-specific understanding into ML frameworks to extract features that effectively distinguish between different vessel types. In this study, we propose a model that uses the coherently averaged power spectral estimation (CAPSE) algorithm. Vessel frequency spectra is first computed through the CAPSE analysis, capturing key machinery characteristics. Further, the features are processed via a vision transformer (ViT) network. This method enables the model to learn more complex relationships and patterns within the data, thereby improving the classification performance. This is accomplished by using self-attention mechanisms to capture global dependencies between features, enabling the model to focus on relationships throughout the entire input. The results, evaluated on standard DeepShip and ShipsEar datasets, show that the proposed model achieved a classification accuracy of 97.98 % and 99.19 % while utilizing just 1.90 million parameters, outperforming other models such as ResNet18 and UATR-Transformer in terms of both accuracy and computational efficiency. This work offers an improvement to the development of efficient marine vessel classification systems for underwater acoustics applications, demonstrating that high performance can be achieved with reduced computational complexity.

Keywords:

underwater acoustic targets, CAPSE, vision transformer, CNN, LOFAR gram

References

1. Aslam M.A. et al. (2024), Underwater sound classification using learning based methods: A review, Expert Systems with Applications, 255(Part 1): 124498, https://doi.org/10.1016/j.eswa.2024.124498.

2. Bianco M.J. et al. (2019), Machine learning in acoustics: Theory and applications, The Journal of the Acoustical Society of America, 146(5): 3590–3628. https://doi.org/10.1121/1.5133944.

3. Bjorno L. (2017), Underwater acoustic measurements and their applications, [in:] Applied Underwater Acoustics, Neighbors T.H., III, Bradley D. [Eds.], pp. 889–947, Elsevier, https://doi.org/10.1016/B978-0-12-811240-3.00014-X.

4. Cao X., Togneri R., Zhang X., Yu Y. (2019), Convolutional neural network with second-order pooling for underwater target classification, IEEE Sensors Journal, 19(8): 3058–3066, https://doi.org/10.1109/JSEN.2018.2886368.

5. Chen J., Han B., Ma X., Zhang J. (2021), Underwater target recognition based on multi-decision LOFAR spectrum enhancement: A deep-learning approach, Future Internet, 13(10): 265, https://doi.org/10.3390/fi13100265.

6. Chen L., Luo X., Zhou H. (2024), A ship-radiated noise classification method based on domain knowledge embedding and attention mechanism, Engineering Applications of Artificial Intelligence, 127(Part B): 107320, https://doi.org/10.1016/j.engappai.2023.107320.

7. Cinelli L.P., Chaves G.S., Lima M.V.S. (2018), Vessel classification through convolutional neural networks using passive sonar spectrogram images, [in:] Proceedings of the Simpósio Brasileiro de Telecomunicaçõese Processamento de Sinais (SBrT 2018), pp. 21–25, http://doi.org/10.14209/sbrt.2018.340.

8. de Carvalho H.T., Avila F.R., Biscainho L.W.P. (2021), Bayesian restoration of audio degraded by lowfrequency pulses modeled via Gaussian process, IEEE Journal of Selected Topics in Signal Processing, 15(1): 90–103, https://doi.org/10.1109/JSTSP.2020.3033410.

9. de Moura N.N., de Seixas J.M. (2016), Novelty detection in passive SONAR systems using support vector machines, 2015 Latin-America Congress on Computational Intelligence (LA-CCI), https://doi.org/10.1109/LA-CCI.2015.7435957.

10. Domingos L.C.F., Santos P.E., Skelton P.S.M., Brinkworth R.S.A., Sammut K. (2022), A survey of underwater acoustic data classification methods using deep learning for shoreline surveillance, Sensors, 22(6): 2181, https://doi.org/10.3390/s22062181.

11. Dosovitskiy A. et al. (2020). An image is worth 16x16 words: Transformers for image recognition at scale, arXiv, https://doi.org/10.48550/arXiv.2010.11929.

12. Feng S., Jiang K., Kong X. (2021), A line spectrum detector based on improved coherent power spectrum estimation, Journal of Physics: Conference Series, 1971(1): 012006, https://doi.org/10.1088/1742-6596/1971/1/012006.

13. Feng S., Zhu X. (2022), A transformer-based deep learning network for underwater acoustic target recognition, IEEE Geoscience and Remote Sensing Letters, 19: 1–5, https://doi.org/10.1109/LGRS.2022.3201396.

14. Hegazy A.E., Makhlouf M.A., El-Tawel G.S. (2020), Improved salp swarm algorithm for feature selection, Journal of King Saud University – Computer and Information Sciences, 32(3): 335–344, https://doi.org/10.1016/j.jksuci.2018.06.003.

15. Hong F., Liu C., Guo L., Chen F., Feng H. (2021), Underwater acoustic target recognition with ResNet18 on shipsear dataset, 2021 IEEE 4th International Conference on Electronics Technology (ICET), pp. 1240–1244, https://doi.org/10.1109/ICET51757.2021.9451099.

16. Hu G., Wang K., Liu L. (2021), Underwater acoustic target recognition based on depthwise separable convolution neural networks, Sensors, 21(4): 1429, https://doi.org/10.3390/s21041429.

17. Ikpekha O.W., Eltayeb A., Pandya A., Daniels S. (2018), Operational noise associated with underwater sound emitting vessels and potential effect of oceanographic conditions: A Dublin Bay port area study, Journal of Marine Science and Technology, 23: 228–235, https://doi.org/10.1007/s00773-017-0468-4.

18. Irfan M., Jiangbin Z., Ali S., Iqbal M., Masood Z., Hamid U. (2021), DeepShip: An underwater acoustic benchmark dataset and a separable convolution based autoencoder for classification, Expert Systems with Applications, 183: 115270, https://doi.org/10.1016/j.eswa.2021.115270.

19. Khishe M., Mohammadi H. (2019), Passive sonar target classification using multi-layer perceptron trained by salp swarm algorithm, Ocean Engineering, 181: 98–108, https://doi.org/10.1016/j.oceaneng.2019.04.013.

20. Kim K.-I., Pak M.-I., Chon B.-P., Ri C.-H. (2021), A method for underwater acoustic signal classification using convolutional neural network combined with discrete wavelet transform, International Journal of Wavelets, Multiresolution and Information Processing, 19(04): 2050092, https://doi.org/10.1142/S0219691320500927.

21. Lampert T.A., O’Keefe S.E.M. (2013), On the detection of tracks in spectrogram images, Pattern Recognition, 46(5): 1396–1408, https://doi.org/10.1016/j.patcog.2012.11.009.

22. Lan H., White P.R., Li N., Li J., Sun D. (2020), Coherently averaged power spectral estimate for signal detection, Signal Processing, 169: 107414, https://doi.org/10.1016/j.sigpro.2019.107414.

23. Li X., Wang D., Tian Y., Kong X. (2023), A method for extracting interference striations in lofargram based on decomposition and clustering, IET Image Processing, 17(6): 1951–1958, https://doi.org/10.1049/ipr2.12768.

24. Lim T., Bae K., Hwang C., Lee H. (2007), Classification of underwater transient signals using MFCC feature vector, 2007 9th International Symposium on Signal Processing and Its Applications, ISSPA 2007, Proceedings, pp. 1–4, https://doi.org/10.1109/ISSPA.2007.4555521.

25. Luo X., Chen L., Zhou H., Cao H. (2023), A survey of underwater acoustic target recognition methods based on machine learning, Journal of Marine Science and Engineering, 11(2): 384, https://doi.org/10.3390/. jmse11020384.

26. Luo X., Zhang M., Liu T., Huang M., Xu X. (2021), An underwater acoustic target recognition method based on spectrograms with different resolutions, Journal of Marine Science and Engineering, 9(11): 1246, https://doi.org/10.3390/jmse9111246.

27. McKenna M.F. et al. (2024), Understanding vessel noise across a network of marine protected areas, Environmental Monitoring and Assessment, 196(4): 369, https://doi.org/10.1007/s10661-024-12497-2.

28. Müller N., Reermann J., Meisen T. (2024), Navigating the depths: A comprehensive survey of deep learning for passive underwater, IEEE Access, 12: 154092–154118, https://doi.org/10.1109/ACCESS.2024.3480788.

29. Noumida A., Rajan R. (2022), Multi-label bird species classification from audio recordings using attention framework, Applied Acoustics, 197: 108901, https://doi.org/10.1016/j.apacoust.2022.108901.

30. Pang D., Wang H., Ma J., Liang D. (2023), DCTN: A dense parallel network combining CNN and transforme for identifying plant disease in field, Soft Computing, 27(21): 15549–15561, https://doi.org/10.1007/s00500-023-09071-2.

31. Park J., Jung D.-J. (2021), Deep convolutional neural network architectures for tonal frequency identification in a lofargram, International Journal of Control, Automation and Systems, 19(2): 1103–1112, https://doi.org/10.1007/s12555-019-1014-4.

32. Raffel C. et al. (2020), Exploring the limits of transfer learning with a unified text-to-text transformer, Journal of Machine Learning Research, 21(140): 1–67.

33. Santos-Domınguez D., Torres-Guijarro S., Cardenal-Lopez A., Pena-Gimenez A. (2016), ShipsEar: An underwater vessel noise database, Applied Acoustics, 113: 64–69, https://doi.org/10.1016/j.apacoust.2016.06.008.

34. Sharma G., Umapathy K., Krishnan S. (2020), Trends in audio signal feature extraction methods, Applied Acoustics, 158: 107020, https://doi.org/10.1016/j.apacoust.2019.107020.

35. Sherin B.M., Supriya M.H. (2015), Selection and parameter optimization of SVM kernel function for underwater target classification, [in:] 2015 IEEE Underwater Technology (UT), pp. 1–5, https://doi.org/10.1109/UT.2015.7108260.

36. Siddagangaiah S., Li Y., Guo X., Chen X., Zhang Q., Yang K., Yang Y. (2016), A complexity-based approach for the detection of weak signals in ocean ambient noise, Entropy, 18(3): 101, https://doi.org/10.3390/e18030101.

37. Singh P., Saha G., Sahidullah M. (2021), Non-linear frequency warping using constant-Q transformation for speech emotion recognition, [in:] 2021 International Conference on Computer Communication and Informatics (ICCCI), pp. 1–6, https://doi.org/10.1109/ICCCI50826.2021.9402569.

38. Song G., Guo X., Wang W., Ren Q., Li J., Ma L. (2021), A machine learning-based underwater noise classification method, Applied Acoustics, 184: 108333, https://doi.org/10.1016/j.apacoust.2021.108333.

39. Thomas M., Martin B., Kowarski K., Gaudet B., Matwin S. (2020), Marine mammal species classification using convolutional neural networks and a novel acoustic representation, [in:] Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Lecture Notes in Computer Science, 11908: 290–305, https://doi.org/10.1007/978-3-030-46133-1. 18.

40. Yang Y., Yao Q., Wang Y. (2024), Underwater acoustic target recognition method based on feature fusion and residual CNN, IEEE Sensors Journal, 24(22): 37342–37357, https://doi.org/10.1109/JSEN.2024.3464754.

41. Yuan F., Ke X., Cheng E. (2019), Joint representation and recognition for ship-radiated noise based on multimodal deep learning, Journal of Marine Science and Engineering, 7(11): 380, https://doi.org/10.3390/jmse7110380.

42. Zeng Y., Zhang M., Han F., Gong Y., Zhang J. (2019), Spectrum analysis and convolutional neural network for automatic modulation recognition, [in:] IEEE Wireless Communications Letters, 8(3): 929–932, https://doi.org/10.1109/LWC.2019.2900247.

Online first
2025, Vol 50
	No 1
2024, Vol 49
	No 1	No 2	No 3	No 4
2023, Vol 48
	No 1	No 2	No 3	No 4
2022, Vol 47
	No 1	No 2	No 3	No 4
2021, Vol 46
	No 1	No 2	No 3	No 4
2020, Vol 45
	No 1	No 2	No 3	No 4
2019, Vol 44
	No 1	No 2	No 3	No 4
2018, Vol 43
	No 1	No 2	No 3	No 4
2017, Vol 42
	No 1	No 2	No 3	No 4
2016, Vol 41
	No 1	No 2	No 3	No 4
2015, Vol 40
	No 1	No 2	No 3	No 4
2014, Vol 39
	No 1	No 2	No 3	No 4
2013, Vol 38
	No 1	No 2	No 3	No 4
2012, Vol 37
	No 1	No 2	No 3	No 4
2011, Vol 36
	No 1	No 2	No 3	No 4
2010, Vol 35
	No 1	No 2	No 3	No 4
2009, Vol 34
	No 1	No 2	No 3	No 4
2008, Vol 33
	No 1	No 2	No 3	No 4	No 4(S)
2007, Vol 32
	No 1	No 2	No 3	No 4	No 4(S)
2006, Vol 31
	No 1	No 2	No 3	No 4	No 4(S)
2005, Vol 30
	No 1	No 2	No 3	No 4
2004, Vol 29
	No 1	No 2	No 3	No 4
2003, Vol 28
	No 1	No 2	No 3	No 4
2002, Vol 27
	No 1	No 2	No 3	No 4
2001, Vol 26
	No 1	No 2	No 3	No 4
2000, Vol 25
	No 1	No 2	No 3	No 4
1999, Vol 24
	No 1	No 2	No 3	No 4
1998, Vol 23
	No 1	No 2	No 3	No 4
1997, Vol 22
	No 1	No 2	No 3	No 4
1996, Vol 21
	No 1	No 2	No 3	No 4
1995, Vol 20
	No 1	No 2	No 3	No 4
1994, Vol 19
	No 1	No 2	No 3	No 4
1993, Vol 18
	No 1	No 2	No 3	No 4
1992, Vol 17
	No 1	No 2	No 3	No 4
1991, Vol 16
	No 1	No 2	No 3-4
1990, Vol 15
	No 1-2		No 3-4
1989, Vol 14
	No 1-2		No 3-4
1988, Vol 13
	No 1-2		No 3-4
1987, Vol 12
	No 1	No 2	No 3-4
1986, Vol 11
	No 1	No 2	No 3	No 4
1985, Vol 10
	No 1	No 2	No 3	No 4
1984, Vol 9
	No 1-2		No 3	No 4
1983, Vol 8
	No 1	No 2	No 3	No 4
1982, Vol 7
	No 1	No 2	No 3-4
1981, Vol 6
	No 1	No 2	No 3	No 4
1980, Vol 5
	No 1	No 2	No 3	No 4
1979, Vol 4
	No 1	No 2	No 3	No 4
1978, Vol 3
	No 1	No 2	No 3	No 4
1977, Vol 2
	No 1	No 2	No 3	No 4
1976, Vol 1
	No 1	No 2	No 3	No 4

CAPSE-ViT: A Lightweight Framework for Underwater Acoustic Vessel Classification Using Coherent Spectral Estimation and Modified Vision Transformer

Downloads

Authors

Abstract

Keywords:

References

cover

ippt-pan

Issue

Pages

Section

DOI

License

How to Cite

Principal Contact

Address

Support Contact