SPEECH COMMUNICATION AND INTELLIGIBILITY ENHANCEMENT BY MACHINE LEARNING ALGORITHM

K. Shireesha, T. RajaShekar

Abstract


Information theoretical concepts have been used in the analysis of human hearing and for the definition of measures of intelligibility. These models do not have the notion of production noise, but the model of  considers sensory noise, Based on the assumption that different algorithms are likely to enjoy different qualities and suffer from different flaws, we investigate the possibility of combining the strengths of multiple speech enhancement algorithms, formulating the problem in an ensemble learning framework. As a first example of such a system, we consider the prediction of a time-frequency mask obtained from the clean speech, based on the outputs of various algorithms applied on the noisy mixture. We consider several approaches involving various notions of context and various machine learning algorithms for classification, in the case of binary masks, and regression, in the case of continuous masks. We show that combining several algorithms in this way can lead to an improvement in enhancement performance.


Keywords


Ensemble Learning; Speech Enhancement; Time-Frequency Mask; Intelligibility; Speech;

References


P. C. Loizou, Speech Enhancement, Theory and Practice. Boca Raton, FL: CRC Press, 2007.

J. G. Fiscus, “A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER),” in Proc. ASRU, 1997, pp. 347–354.

J. Barker, E. Vincent, N. Ma, H. Christensen, and P. Green, “The PASCAL CHiME speech separation and recognition challenge,” Computer Speech & Language, 2012.

C. Cortes and V. Vapnik, “Support vector machine,” Machine learning, vol. 20, no. 3, pp. 273–297, 1995.

D. D. Lewis, “Naive (Bayes) at forty: The independence assumption in information retrieval,” in Proc. ECML, 1998, pp. 4–15.

L. Olshen, J. H. Breiman, R. A. Friedman, and C. J. Stone, Classification and Regression Trees. Wadsworth International Group, 1984.

L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp. 5–32, 2001.

F. R. Bach and M. I. Jordan, “Learning spectral clustering, with application to speech separation,” JMLR, vol. 7, pp. 1963–2001, 2006.

R. J. Weiss and D. P. W. Ellis, “Estimating single-channel source separation masks: Relevance vector machine classifiers vs. pitch-based masking,” in Proc. SAPA, 2006, pp. 31–36.

G. Kim, Y. Lu, Y. Hu, and P. C. Loizou, “An algorithm that improves speech intelligibility in noise for normal-hearing listeners,” J. Acoust. Soc. Am., vol. 126, no. 3, pp. 1486–1494, 2009.

K. Han and D. Wang, “A classification based approach to speech segregation,” J. Acoust. Soc. Am., vol. 132, no. 5, pp. 3475–3483, 2012.

D. Wang and G. J. Brown, Eds., Computational Auditory Scene Analysis: Principles, Algorithms, and Applications. Wiley, 2006.

D.Wang, “On ideal binary mask as the computational goal of auditory scene analysis,” in Speech separation by humans and machines, P. Divenyi, Ed. Kluwer Academic Publishers, 2005, ch. 12, pp. 181–197.

E. Vincent, J. Barker, S. Watanabe, J. Le Roux, F. Nesta, and M. Matassoni, “The second CHiME speech separation and recognition challenge: Datasets, tasks and baselines,” in Proc. ICASSP, May 2013.

E. Vincent, R. Gribonval, and C. F´evotte, “Performance measurement in blind audio source separation,” IEEE Trans. ASLP, vol. 14, no. 4, pp. 1462–1469, 2006.

P. J. Moreno, B. Raj, and R. M. Stern, “A vector Taylor series approach for environment-independent speech recognition,” in Proc. ICASSP, vol. 2, May 1996, pp. 733–736.

J. Le Roux and J. R. Hershey, “Indirect model-based speech enhancement,” in Proc. ICASSP, Mar. 2012, pp. 4045–4048.

I. Cohen, “Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging,” IEEE Trans. SAP, vol. 11, no. 5, pp. 466–475, 2003.

R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin, “LIBLINEAR: A library for large linear classification,” JMLR, vol. 9, pp. 1871–1874, 2008.


Full Text: PDF

Refbacks

  • There are currently no refbacks.




Copyright © 2012 - 2023, All rights reserved.| ijitr.com

Creative Commons License
International Journal of Innovative Technology and Research is licensed under a Creative Commons Attribution 3.0 Unported License.Based on a work at IJITR , Permissions beyond the scope of this license may be available at http://creativecommons.org/licenses/by/3.0/deed.en_GB.