how to convert bytes to np.array

Question

I'm read a buffer of bytes from data recorded through my computer's microphone (2 channels) using pyaudio example taken from site.

import pyaudio
import wave

CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = "output.wav"

p = pyaudio.PyAudio()

stream = p.open(format=FORMAT,
                channels=CHANNELS,
                rate=RATE,
                input=True,
                frames_per_buffer=CHUNK)

print("* recording")

frames = []

for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
    data = stream.read(CHUNK)
    frames.append(data)

print("* done recording")

print frames

frames looks like this:

['\x00\xfd\xff\xff.....\xfc\xff\xff', '\xff\xfc\xff\xff......\xfc\xff\xff', ... ]

or if I change CHUNK = 1:

['\x00\xfd\xff\xff', '\xff\xfc\xff\xff', '\x00\xfd\xcc\xcf']

though of course much longer. I suspect that the bytes are interleaved for each channel so I think I need to break them out in pairs of two.

What I'd like is an array like this:

np.array([
  [123, 43],
  [3, 433],
  [43, 66]
])

where the first column is the values from the first channel, and the second from the second channel. how do I go about interpreting these encoded values (with CHUNK set to a reasonable value like 1024)?

UPDATE:

I'm quite confused. I used the below to change the format list of strings into a single string of space-separated hex values, but there appears to be an odd number of them...which wouldn't happen if there are two values, one for each channel (would be even number):

fms = ''.join(frames)
fms_string = ''.join( [ "%02X " % ord( x ) for x in fms ] ).strip()
fms_list = fms_string.split(" ")
print len(fms_list) # this prints an ODD number...

UPDATE 2:

I tried a simpler route and tried this:

import array
fstring = ''.join(frames)
wave_nums = array.array('h', fstring) # this correctly returns list of ints!
print len(wave_nums)

I tried this for different recording times and got the following (confusing results):

RECORD_SECONDS = 2 ---> len(wave_nums) is 132300 (132300 / 44100 = 3 seconds of frames)
RECORD_SECONDS = 4 ---> len(wave_nums) is 308700 (308700 / 44100 = 7 seconds of frames)
RECORD_SECONDS = 5 ---> len(wave_nums) is 396900 (396900 / 44100 = 9 seconds of frames)

which implies that I'm getting a number of frames consistent with 2*(number of seconds recording) - 1 seconds...how is this possible?

Dylan MacKenzie · Accepted Answer · 2013-08-29 18:39:43Z

1

Based on a quick glance of the portaudio source it looks like the channels are in fact interleaved

You can use a join to flatten the list, calculate the left and right values (you set them to be 16 bits long), and then zip the list with itself.

joined = ''.join(frames).encode('latin-1')

left = map(lambda m, l: (m << 8) + l, joined[0::4], joined[1::4])
right = map(lambda m, l: (m << 8) + l, joined[2::4], joined[3::4])

zipped = zip(left, right)

On python 2.x, the encode latin1 trick doesn't work, so you'll need to do

joined = ''.join(frames)
joined = map(ord, joined)

left = map(lambda m, l: (m << 8) + l, joined[0::4], joined[1::4])
right = map(lambda m, l: (m << 8) + l, joined[2::4], joined[3::4])

zipped = zip(left, right)

This has something to do with python 2.x's preference for ascii strings vs unicode.

Update:

w.r.t the odd number of bytes, read might have tried to read too many bytes ahead and failed silently, only returning whatever it had at the moment. You should always receive a multiple of CHUNK bytes from read under normal conditions, so unless your join function has an error, something is wrong on their end. Try it with mine and see what happens.

edited Aug 29, 2013 at 18:39

answered Aug 29, 2013 at 4:38

Dylan MacKenzie

6326 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

lollercoaster Over a year ago

on the encode() line: UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 608: ordinal not in range(128)

lollercoaster Over a year ago

and without that, the map() lines fail with: TypeError: unsupported operand type(s) for <<: 'str' and 'int'

lollercoaster Over a year ago

This seems to function, but for some reason after recording 1 second of audio and using this code, the lengths of left and right are 44032 (when it should be 44100). Somehow 68 frames are getting lost.

Dylan MacKenzie Over a year ago

Hmm, probably truncated division when you do RATE / CHUNK, 215 * 1024 is 44032

Dylan MacKenzie Over a year ago

I think you want to do the callback version of recording for pyaudio, that way you don't have to worry about manually reading the bytes. However, that small a speedup shouldn't be noticable to your ear. Try using pyAudio to play the sampled data back.

|

lollercoaster · Accepted Answer · 2013-08-29 19:48:45Z

1

The simplest answer appears to be this:

import array
f = ''.join(frames)
nums = array.array('h', f)
left = nums[1::2]
right = nums[0::2]

@Dylan's answer is also good but a bit more verbose and also the values are unsigned, where wav values are signed.

Also changing CHUNK to a value of 1225 is best since 44100 is a multiple of 1225, and no frames are lost as a result of rounding error.

answered Aug 29, 2013 at 19:48

lollercoaster

16.6k35 gold badges123 silver badges183 bronze badges

Collectives™ on Stack Overflow

how to convert bytes to np.array

2 Answers 2

6 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related