Head-Related Transfer Function Selection Using Neural Networks
DOI:
https://doi.org/10.1515/aoa-2017-0038Keywords:
head-related transfer function, neural networks, localization, music, audio, anthropometry, pinnaAbstract
In binaural audio systems, for an optimal virtual acoustic space a set of head-related transfer functions (HRTFs) should be used that closely matches the listener’s ones. This study aims to select the most appropriate HRTF dataset from a large database for users without the need for extensive listening tests. Currently, there is no way to reliably reduce the number of datasets to a smaller, more manageable number without risking discarding potentially good matches. A neural network that estimates the appropriateness of HRTF datasets based on input vectors of anthropometric measurements is proposed. The shapes and sizes of listeners’ heads and pinnas were measured using digital photography; the measured anthropometric parameters form the feature vectors used by the neural network. A graphical user interface (GUI) was developed for participants to listen to music transformed using different HRTFs and to evaluate the fitness of each HRTF dataset. The listening scores recorded were the target outputs used to train the neural networks. The aim was to learn a mapping between anthropometric parameters and listener’s perception scores. Experimental validations were performed on 30 subjects. It is demonstrated that the proposed system produces a much more reliable HRTF selection than previously used methods.References
Algazi V.R., Duda R.O., Thompson D.M., Avendano C. (2001), The CIPIC HRTF database, Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Electro-Acoustics, pp. 99–102.
Batteau D.W. (1967), The role of the pinna in human localisation, Royal Society London, 168, B, 158–180.
Benitez J.M., Castro J.L., Requena I. (1997), Are artificial neural networks black boxes, IEEE Transactions on Neural Networks, 8, 5, 1156–1164.
Brown C.P., Duda R.O. (1997), An efficient HRTF model for 3-D sound, Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 19–22.
Brown C.P., Duda R.O. (1998), A structural model for binaural sound synthesis, Virtual sound rendering in a stereophonic loudspeaker setup, IEEE Transactions on Audio, Speech, and Language Processing, 6, 5, 476–488.
Choi T., Park Y., Youn D., Lee S. (2011), Virtual sound rendering in a stereophonic loudspeaker setup, IEEE Transactions on Audio, Speech, and Language Processing, 19, 7, 1962 –1974
Chun C.J., Kim H.K., Choi S.H., Jang S.J., Lee S.P. (2011), Sound source elevation using spectral notch filtering and directional band boosting in stereo loudspeaker reproduction, IEEE Transactions on Consumer Electronics, 57, 4, 1915–1920.
Collins T. (2013), Binaural ambisonic decoding with enhanced lateral localization, Proceedings of Audio Engineering Society 134th Convention.
Dave V.S., Dutta K. (2014), Neural network based models for software effort estimation: a review, Artificial Intelligence Review, 42, 2, 295–307.
Fechner G.T. (1860), Elements of psychophysics, Holt Rinehart & Winston, New York.
Gupta N., Barreto A., Joshi M., Aguedelo J. (2010), HRTF database at FIU DSP lab, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 169–172.
Gupta N., Barreto A., Ordonez C. (2002), Spectral modification of head-related transfer functions for improved virtual sound spatialization, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 1953–1956.
Hagan M.T., Demuth H.B., Beale M. (2002), Neural Network Design, CITIC Publishing House, Beijing.
Ideri A., Abran A., Mbarki S. (2004), Validating and understanding software cost estimation models based on neural networks, Proceedings of IEEE International Conference on Information and Communication Technologies, pp. 433–434.
Ircam (2002), Listen HRTF database, http://recherche.ircam.fr/equipes/salles/listen/
Jang J.-S.R, Sun C.T. (1993), Functional equivalence between radial basis function networks and fuzzy inference systems, IEEE Transactions on Neural Networks, 4, 1, 156–159.
Masterson C., Kearney G., Gorzel M., Boland F.M. (2012), HRIR order reduction using approximate factorization. IEEE Transactions on Audio, Speech, and Language Processing, 20, 6, 1808–1817.
Pett M.A. (1997), Nonparametric statistics for health care research: Statistics for small samples and unusual distributions, Sage Publications, Thousand Oaks, CA.
Ranjan R., Gan W.-S. (2015), Natural listening over headphones in augmented reality using adaptive filtering techniques, IEEE/ACM Trans. Audio, Speech and Language Processing, 23, 11, 1988–2002.
Salkind N.J. (2004), Statistics for people who (think they) hate statistics, Sage Publications, Thousand Oaks, CA.
Shabtai N.R., Rafaely B. (2014), Generalized spherical array beamforming for binaural speech reproduction, IEEE/ACM Transactions on Audio, Speech and Language Processing, 22, 1, 238–247.
Tan C.-J., Gan W.-S. (1998), User-defined spectral manipulation of HRTF for improved localisation in 3D sound systems, Electronics Letters, 34, 25, 2387–2389.
Watkins A.J. (1978), Psychoacoustical aspects of synthesized vertical locale cues, Journal of Acoustical Society of America, 63, 4, 1152–1165.
Wythoff B.J. (1993), Backpropagation neural networks: a tutorial, Chemometrics and Intelligent Laboratory Systems,18, 115–155.
Yao S.-N., Chen L.J. (2013), HRTF Adjustments with audio quality assessments, Archives of Acoustics, 38, 1, 55–62.
Zhang M., Tan K.-C., Er M.H. (1998), A refined algorithm of 3-D sound synthesis, Proceedings of IEEE International Conference on Signal Processing Proceedings, pp. 1408–1411.
Zotkin D.N., Duraiswami R., Davis L.S. (2004), Rendering localized spatial audio in a virtual auditory space, IEEE Transactions on Multimedia, 6, 4, 553–564.