Book Details


IT Skills Show & International Conference on Advancements in Computing Resources, (SSICACR-2017) 15 and 16 February 2017, Alagappa University, Karaikudi, Tamil Nadu, India. International Journal of Computer Science (IJCS) Published by SK Research Group of Companies (SKRGC)

Download this PDF format


The purpose of this project is to develop an Online E-Voting system by speech detection. Online voting (e-voting) would be more convenient, relatively secure and utilize fewer resources. To be able to access e-voting system from a personal, business or even a public library computer may be more convenient for many people needing to vote. Voice Activity Detection (VAD) is a very important front end processing in all Speech and Audio processing applications. The performance of most if not all speech/audio processing methods is crucially dependent on the performance of Voice Activity Detection. An ideal voice activity detector needs to be independent from application area and noise condition and have the least parameter tuning in real applications. In this paper a nearly ideal VAD algorithm is proposed which is both easy-to-implement and noise robust, comparing to some previous methods. The proposed method uses short-term features such as Spectral Flatness (SF) and Short-term Energy. This helps the method to be appropriate for online processing tasks. The proposed method was evaluated on several speech corpora with additive noise and is compared with some of the most recent proposed algorithms. The ex-periments show satisfactory performance in various noise conditions.


[1] A. Benyassine, E. Shlomot, H. Y. Su, D. Massaloux, C.Lamblin and J. P. Petit, "ITU-T Recommendation G.729 An-nex B: a silence compression scheme for use with G.729 op-timized for V.70 digital simultaneous voice and data applica-tions," IEEE Communications Magazine 35, pp. 64-73, 1997.

[2] M. H. Savoji, "A robust algorithm for accurate end point-ing of speech," Speech Communication, pp. 45–60, 1989.

[3] B. Kingsbury, G. Saon, L. Mangu, M. Padmanabhan and R. Sarikaya, “Robust speech recognition in noisy environ-ments: The 2001 IBM SPINE evaluation system,” Proc. ICASSP, 1, pp. 53-56, 2002.

[4] T. Kristjansson, S. Deligne and P. Olsen, “Voicing fea-tures for robust speech detection,” Proc. Interspeech, pp. 369-372, 2005.

[5] R. E. Yantorno, K. L. Krishnamachari and J. M. Lovekin, “The spectral autocorrelation peak valley ratio (SAPVR) – A usable speech measure employed as a co-channel detection system,” Proc. IEEE Int Workshop Intell. Signal Process. 2001.

[6] M. Marzinzik and B. Kollmeier, “Speech pause detection for noise spectrum estimation by tracking power envelope dynamics,” IEEE Trans. Speech Audio Process, 10, pp. 109-118, 2002.

[7] ETSI standard document, ETSI ES 202 050 V 1.1.3., 2003.

[8] K. Li, N. S. Swamy and M. O. Ahmad, “An improved voice activity detection using higher order statistics,” IEEE Trans. Speech Audio Process., 13, pp. 965-974, 2005.

[9] W. H. Shin, "Speech/non-speech classification using mul-tiple features for robust endpoint detection," ICASSP, 2000.

[10] G. D. Wuand and C. T. Lin, "Word boundary detection with mel scale frequency bank in noisy environment," IEEE Trans. Speechand Audio Processing, 2000.

[11] A. Lee, K. Nakamura, R. Nisimura, H. Saruwatari and K. Shikano, “Noise robust real world spoken dialogue system using GMM based rejection of unintended inputs,” Inter-speech, pp. 173-176, 2004.

[12] J. Sohn, N. S. Kim and W. Sung, “A statistical model-based voice activity detection,” IEEE Signal Process. Lett., pp. 1-3, 1999.

[13] B. Lee and M. Hasegawa-Johnson, "Minimum Mean Squared Error A Posteriori Estimation of High Variance Ve-hicular Noise," in Proc. Biennial on DSP for In-Vehicle and Mobile Systems, Istanbul, Turkey, June 2007.

[14] ETSI EN 301 708 recommendations, “Voice activity detector for adaptive multi-rate (AMR) speech traffic chan-nels,” 1999.

[15] ETSI ES 202 050 recommendation, “Speech processing, transmission and quality aspects (STQ); distributed speech recognition; advanced front-end feature extraction algo-rithm; compression algorithms,” 2002.

[16] J. S. Garofalo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett and N. L. Dahlgren, The DARPA TIMIT Acous-tic-Phonetic Continuous Speech Corpus, Linguistic Data Consortium, 1993.

[17] M. Bijankhan, Great Farsdat Database, Technical re-port, Research center on Inteligent Signal Proccessing, 2002

[18] P.A. Estévez, N. Becerra-Yoma, N. Boric and J.A. Ra-m?rez, "Genetic programming-based voice activity detec-tion," Electronics Letters, Vol. 41, No. 20, 2005.


  • Format Volume 5, Issue 1, No 19, 2017
  • Copyright All Rights Reserved ©2017
  • Year of Publication 2017
  • Author S.Nasrin, Dr.V.Palanisamy, R.Anandha Jothi
  • Reference IJCS-249
  • Page No 1580-1588

Copyright 2022 SK Research Group of Companies. All Rights Reserved.