0

I'm read a buffer of bytes from data recorded through my computer's microphone (2 channels) using pyaudio example taken from site.

import pyaudio
import wave

CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = "output.wav"

p = pyaudio.PyAudio()

stream = p.open(format=FORMAT,
                channels=CHANNELS,
                rate=RATE,
                input=True,
                frames_per_buffer=CHUNK)

print("* recording")

frames = []

for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
    data = stream.read(CHUNK)
    frames.append(data)

print("* done recording")

print frames

frames looks like this:

['\x00\xfd\xff\xff.....\xfc\xff\xff', '\xff\xfc\xff\xff......\xfc\xff\xff', ... ]

or if I change CHUNK = 1:

['\x00\xfd\xff\xff', '\xff\xfc\xff\xff', '\x00\xfd\xcc\xcf']

though of course much longer. I suspect that the bytes are interleaved for each channel so I think I need to break them out in pairs of two.

What I'd like is an array like this:

np.array([
  [123, 43],
  [3, 433],
  [43, 66]
])

where the first column is the values from the first channel, and the second from the second channel. how do I go about interpreting these encoded values (with CHUNK set to a reasonable value like 1024)?


UPDATE:

I'm quite confused. I used the below to change the format list of strings into a single string of space-separated hex values, but there appears to be an odd number of them...which wouldn't happen if there are two values, one for each channel (would be even number):

fms = ''.join(frames)
fms_string = ''.join( [ "%02X " % ord( x ) for x in fms ] ).strip()
fms_list = fms_string.split(" ")
print len(fms_list) # this prints an ODD number...

UPDATE 2:

I tried a simpler route and tried this:

import array
fstring = ''.join(frames)
wave_nums = array.array('h', fstring) # this correctly returns list of ints!
print len(wave_nums) 

I tried this for different recording times and got the following (confusing results):

RECORD_SECONDS = 2 ---> len(wave_nums) is 132300 (132300 / 44100 = 3 seconds of frames)
RECORD_SECONDS = 4 ---> len(wave_nums) is 308700 (308700 / 44100 = 7 seconds of frames)
RECORD_SECONDS = 5 ---> len(wave_nums) is 396900 (396900 / 44100 = 9 seconds of frames)

which implies that I'm getting a number of frames consistent with 2*(number of seconds recording) - 1 seconds...how is this possible?

2 Answers 2

1

Based on a quick glance of the portaudio source it looks like the channels are in fact interleaved

You can use a join to flatten the list, calculate the left and right values (you set them to be 16 bits long), and then zip the list with itself.

joined = ''.join(frames).encode('latin-1')

left = map(lambda m, l: (m << 8) + l, joined[0::4], joined[1::4])
right = map(lambda m, l: (m << 8) + l, joined[2::4], joined[3::4])

zipped = zip(left, right)

On python 2.x, the encode latin1 trick doesn't work, so you'll need to do

joined = ''.join(frames)
joined = map(ord, joined)

left = map(lambda m, l: (m << 8) + l, joined[0::4], joined[1::4])
right = map(lambda m, l: (m << 8) + l, joined[2::4], joined[3::4])

zipped = zip(left, right)

This has something to do with python 2.x's preference for ascii strings vs unicode.

Update:

w.r.t the odd number of bytes, read might have tried to read too many bytes ahead and failed silently, only returning whatever it had at the moment. You should always receive a multiple of CHUNK bytes from read under normal conditions, so unless your join function has an error, something is wrong on their end. Try it with mine and see what happens.

Sign up to request clarification or add additional context in comments.

6 Comments

on the encode() line: UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 608: ordinal not in range(128)
and without that, the map() lines fail with: TypeError: unsupported operand type(s) for <<: 'str' and 'int'
This seems to function, but for some reason after recording 1 second of audio and using this code, the lengths of left and right are 44032 (when it should be 44100). Somehow 68 frames are getting lost.
Hmm, probably truncated division when you do RATE / CHUNK, 215 * 1024 is 44032
I think you want to do the callback version of recording for pyaudio, that way you don't have to worry about manually reading the bytes. However, that small a speedup shouldn't be noticable to your ear. Try using pyAudio to play the sampled data back.
|
1

The simplest answer appears to be this:

import array
f = ''.join(frames)
nums = array.array('h', f)
left = nums[1::2]
right = nums[0::2]

@Dylan's answer is also good but a bit more verbose and also the values are unsigned, where wav values are signed.

Also changing CHUNK to a value of 1225 is best since 44100 is a multiple of 1225, and no frames are lost as a result of rounding error.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.