1

I'm recording a numpy array dt and then writing it in .wav by code like this:

dt = np.int16(dt/np.max(np.abs(dt)) * 32767)
scipy.io.wavfile.write("tmp.wav", samplerate, dt)

after that I read it and recognize by code

import speech_recognition as sr
r = sr.Recognizer()
with sr.AudioFile("tmp.wav") as source:
    audio_text = r.listen(source)
    return r.recognize_google(audio_text, language = lang)

Can I do recognition from numpy array without using wav? Cuz it takes excess time

2 Answers 2

0

Assuming this is the module you are using, and according to its documentation, you can pass any file-like object to AudioFile(). File-like objects are objects that support read and write operations.

You should be able to stick the byte representation of the wav file into a io.BytesIO object, which supports these operations, and pass that into your speech recognition module. scipy.io.wavfile.write() supports writing to such file-like objects.

I don't have the package or any WAV files to test it, but let me know if something like this works:

wav_bytes = io.BytesIO()
scipy.io.wavfile.write(wav_bytes, samplerate, dt)
with sr.AudioFile(wav_bytes) as source:
    ...
Sign up to request clarification or add additional context in comments.

4 Comments

I know that I should transform numpy array to some object for SpeechRecognition but I donna how to it, whats way, which functions
Well I‘m not going to write the code for you. Have you tried anything of what I suggested, using the BytesIO?
It doesn't work. Scipy works only with ndarry, but here there is the answer how to play numpy array in pyaudio. New question is how to use PyAudio stream in SpeechRecognition methods
I understand what you want to do. Numpy array -> SpeechRecognition method. What I'm telling you is you need to use a file-like object, such as BytesIO for that. This doesn't actually write any files, it's all in memory. The answer you linked does a similar thing; they also use file-like objects.
0

You can create an audio data object first with AudioData, this is the source that the recognizer needs as a file-like object:

import io
from scipy.io.wavfile import write
import speech_recognition

byte_io = io.BytesIO(bytes())
write(byte_io, sr, audio_array)
result_bytes = byte_io.read()

audio_data = speech_recognition.AudioData(result_bytes, sr, 2)
r = speech_recognition.Recognizer()
text = r.recognize_google(audio_data)

audio_array is a 1-D numpy.ndarray with int16 values and sr is the sampling rate.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.