0

I'm using pyaudio to take input from a microphone or read a wav file, and analyze the stream while playing it. I want to only analyze the right channel if the input is stereo. I've been able to extract the data and convert to integers using loops:

        levels = []
        length = len(data)
        if channels == 1:
            for i in range(length//2):
                volume = abs(struct.unpack('<h', data[i:i+2])[0])
                levels.append(volume)
        elif channels == 2:
            for i in range(length//4):
                j = 4 * i + 2
                volume = abs(struct.unpack('<h', data[j:j+2])[0])
                levels.append(volume)

I think this working correctly, I know it runs without error on a laptop and Raspberry Pi 3, but it appears to consume too much time to run on a Raspberry Pi Zero when simultaneously streaming the output to a speaker. I figure that eliminating the loop and using numpy may help. I assume I need to use np.ndarray to do this, and the first parameter will be (CHUNK,) where CHUNK is my chunk size for analyzing the audio (I'm using 1024). And the format would be '<h', as in the struct code above, I think. But I'm at a loss as to how to code it correctly for each of the two cases (mono and right channel only for stereo). How do I create the numpy arrays for each of the two cases?

1 Answer 1

1

You are here reading 16-bit integers from a binary file. It seems that you are first reading the data into data variable with something like data = f.read(), which is here not visible. Then you do:

for i in range(length//2):
    volume = abs(struct.unpack('<h', data[i:i+2])[0])
    levels.append(volume)

BTW, that code is wrong, it shoud be abs(struct.unpack('<h', data[2*i:2*i+2])[0]), otherwise you are overlapping bytes from different values.

To do the same with numpy, you should just do this (instead of both f.read()and the whole loop):

data = np.fromfile(f, dtype='<i2')

This is over 100 times faster than the manual thing above in my test on 5 MB of data.

In the second case, you have interleaved left-right-left-right values. Again you can read them all (assuming you have enough memory) and then access only one half:

data = np.fromfile(f, dtype='<i2')
left = data[::2]
right = data[1::2]

This processes everything, even though you need just one half, but it is still much much faster.


EDIT: If the data not coming from a file, np.fromfile can be replaced with np.frombuffer. Then you have this:

channel_data = np.frombuffer(data, dtype='<i2')
if channels == 2:
    channel_data = channel_data[1::2]
levels = np.abs(channel_data)
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks! The data is passed in chunks from pyaudio, which is taking the data EITHER from a wav file (and only passing the data part of the wav file contents to the variable "data" OR generating it from microphone input, and doing so in chunks. And I need the binary stream so that pyaudio can play it out. So I can't use the file read part. Assuming is have passed the variable "data", as in my original code, I should use: levels = np.frombuffer(data, dtype='<i2') correct? Then the rest follows as you posted for getting just the left or right channel.
@ViennaMike Exactly! I added that to the answer also, so you have a complete solution ;)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.