4

I am trying to read a bzip2-compressed CSV file in Python 3.2. For an uncompressed CSV file, this works:

datafile = open('./file.csv', mode='rt')
data = csv.reader(datafile)
for e in data:    # works
    process(e)

The problem is that BZ2File only supports creating a binary stream, and in Python 3, csv.reader accepts only text streams. (The same issue occurs with gzip and zip files.)

datafile = bz2.BZ2File('./file.csv.bz2', mode='r')
data = csv.reader(datafile)
for e in data:    # error
    process(e)

In particular, the indicated line throws the exception _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?).

I've also tried data = csv.reader(codecs.EncodedFile(datafile, 'utf8')), but that doesn't fix the error.

How can I wrap the binary input stream so that it can be used in text mode??

1 Answer 1

5

This works for me:

import codecs, csv
f = codecs.open("file.csv", "r", "utf-8")
g = csv.reader(f)
for e in g:
    print(e)

In the case of BZ2:

import codecs, csv, bz2
f = bz2.BZ2File("./file.csv.bz2", mode="r")
c = codecs.iterdecode(f, "utf-8")
g = csv.reader(c)
for e in g:
    print(e)
Sign up to request clarification or add additional context in comments.

2 Comments

And how exactly did you intend on using this with a bz2-compressed file?
@vz0: The last detail to be checked is if the interpretation of newlines works as expected (i.e. no interpretation). In other words, if the newline sequences can be contained inside the quoted values, an uncompressed CSV would use the open(fname, 'r', newlines=''). It is likely, it works correctly. Can you try?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.