"JOURNAL OF RADIO ELECTRONICS" (Zhurnal Radioelektroniki ISSN 1684-1719, N 11, 2019

contents of issue      DOI  10.30898/1684-1719.2019.11.18    full text in Russian (pdf)  

Using deep neural network training for the recognition of user voice commands


A. G. Romanyuk 1, A N. Smirnov1, V. M. Antonova 1,2

1 Bauman Moscow State Technical University, 2-d Baumanskaya str., 5-1, Moscow 105005, Russia

2 Kotelnikov Institute of Radioengineering and Electronics of Russian Academy of Sciences, Mokhovaya str., 11-7, Moscow 125009, Russia


The paper is received on November 19, 2019


Abstract. This work is devoted to the use and development of speech recognition of neural networks. The process of neural network learning has been explored with the archive containing 7100 tracks with indexed tags. Speech signals in those tracks were converted into log-mel spectrograms. Neural network training has occurred onto an entering signal which possessed smooth distribution and normalization. The article describes the ability of the created network to recognize different spoken words and to determine whether the incoming signal is silence or a background noise which was achieved by working out 4000 samples of noise clips. The ability of the network to classify several converted incoming signals simultaneously regardless of the exact position of speech in time is investigated. The process of creating a virtual device that capable of reading the signal from a microphone with a certain sampling frequency of sound is described. The neural network has been obtained in this very project. It may be perfected for the comprehension of a bigger number of voice commands and use in various human activity spheres.

Keywords: neural networks, deep learning, speech recognition.


1. Database < speech_commands >  [online]. URL: https://storage.googleapis.com/download.tensorflow.org/data/speech_commands_v0.01.tar.gz (accessed 17.11.2019)

2. Matlab Reference [online]. URL: http://radiomaster.ru/cad/matlab/index.php (accessed 16.11.2019). (In Russian)

3. Svertochnaya neyronnaya set', chast' 1: struktura, topologiya, funktsii aktivatsii i obuchayushcheye mnozhestvo [Convolutional neural network, part 1: structure, topology, activation functions and training set] [online]. URL: https://habr.com/ru/post/348000/  (accessed 16.11.2019).  (In Russian)

4. Potemkin V.G., Medvedev V.S. Neyronnyye seti. MATHLAB 6 [Neural networks. MATLAB 6]. Moscow. Dialog-MIFI Publishing House. 2002.  496 p. (In Russian)

5. Mel-kepstral'nyye koeffitsiyenty (MFCC) i raspoznavaniye rechi [Chalk-cepstral coefficients (MFCC) and speech recognition] [online]. URL: https://habr.com/ru/post/140828/  (accessed 16.11.2019). (In Russian)


For citation:

Romanyuk A.G., Smirnov A.N., Antonova V.M. Using deep neural network training for the  recognition of user voice commands.  Zhurnal Radioelektroniki - Journal of Radio Electronics. 2019. No. 11. Available at http://jre.cplire.ru/jre/nov19/18/text.pdf

DOI  10.30898/1684-1719.2019.11.18