Preparing data to detect neural speech



  • I'm dealing with speech recognition through neural networks. There's been a lot of information on the network itself, and examples of their work are more complicated.

    Let's say I have a network with N inlets and M I don't know. And there's a sound signal with some text. How do you make this work together?

    We got it, let's say, by transforming Furier's spectrum. Well, we can somehow cut the sound signal into some small pieces. Again, how do they choose the length of a piece? What do we do next? What are we gonna do on the network? And what does the network usually get out? Books matching sound or something? (Well, the challenge is to obtain a written proposal)

    I'm interested in the algorithm as it happens from a man's sentence to a text-wise sentence. The information I've found is usually either the neural network or the transformation of Furje. But how it works together is not very clear.



  • The subject is too broad for a short answer - your question is a good study. However, hypotheses are not prohibited.

    Non-regulatory networks shall accept fixed-size indicators at the entrance. Then we need to normalize the length of the inlet vector. On the question, you pointed out that there are Furje coefficients after change. text♪ Texts are very different lengths and to obtain the same number of coefficients, the length of the signal shall also be the same. In addition, it would be a problem to find a text-based learning body - such a corps would have monster sizes and unattainable learning time. The text could be replaced by separate words: the text could be divided into words, referring to the controversy between words. This logic can be further lowered to recognize individual logic or sound. Here's an example for the word. LEVEL:

    ровно

    There's a signal with a normalized amplitude plus I cut the pause at the beginning of the recording. As can be seen, there is a very tangible pause between the slogs and the entire text can be cut in a series of cuts - in my view, with such small units of information, it is easier to work than in all words. These pieces may then be equated by length (by number of samples), calculated and trained or recognized. Training is also a separate and important issue, as is recognition, and more precisely what to do after recognition. For example, the word was wrong and it was not in the dictionary.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2