Write non-Unicode using csv module

Question

While migrating to Python 3, I noticed some files we generate using the built-in csv now have b' prefix around each strings...

Here's the code, that should generate a .csv for a list of dogs, according to some parameters defined by export_fields (thus always returns unicode data):

file_content = StringIO()
csv_writer = csv.writer(
    file_content, delimiter='\t', quotechar='"', quoting=csv.QUOTE_MINIMAL
)
csv_writer.writerow([
    header_name.encode('cp1252') for _v, header_name in export_fields
])
# Write content
for dog in dogs:
    csv_writer.writerow([
        get_value(dog).encode('cp1252') for get_value, _header in export_fields
    ])

The problem is once I returns file_content.getvalue(), I get:

b'Does he bark?'    b'Full     Name'    b'Gender'
b'Sometimes, yes'   b'Woofy the dog'    b'Male'

Instead of ^{_{(indentation has been modified to be readable on SO)}}:

'Does he bark?'   'Full     Name'   'Gender'
'Sometimes, yes'  'Woofy the dog'   'Male'

I did not find any encoding parameter in the csv module. I would like the whole file to be encoded in cp1252, so I don't really care either the encoding is done through the iteration of the lines or on the file construted itself.

So, does anyone know how to generate a proper string, containing only cp1252 encoded strings?

Why are you encoding in the first place? The open file object takes care of that. — Martijn Pieters
– Martijn Pieters, Commented Jul 29, 2016 at 10:52
@MartijnPieters Maybe my question is incomplete then: I want to return the string through Django: return HttpResponse(generate_csv_file()). Should I handle encoding at Django level instead? — Maxime Lorant
– Maxime Lorant, Commented Jul 29, 2016 at 10:55
See my answer; you are approaching this at the wrong level; tabs and quotechars need to be encoded too, but this is the job of the I/O level, not the csv module or the code producing rows. — Martijn Pieters
– Martijn Pieters, Commented Jul 29, 2016 at 10:57

Martijn Pieters · Accepted Answer · 2016-07-29 11:04:20Z

2

The csv module deals with text, and converts anything that is not a string to a string using str().

Don't pass in bytes objects. Pass in str objects or types that cleanly convert to strings with str(). That means you should not encode strings.

If you need cp1252 output, encode the StringIO value:

file_content.getvalue().encode('cp1252')

as StringIO objects also deal in text only.

Better yet, use a BytesIO object with a TextIOWrapper() to do the encoding for you as the csv module writes to the file object:

from io import BytesIO, TextIOWrapper

file_content = BytesIO()
wrapper = TextIOWrapper(file_content, encoding='cp1252', line_buffering=True)
csv_writer = csv.writer(
    wrapper, delimiter='\t', quotechar='"', quoting=csv.QUOTE_MINIMAL)

# write rows

result = file_content.getvalue()

I've enabled line-buffering on the wrapper so that it'll auto-flush to the BytesIO instance every time a row is written.

Now file_content.getvalue() produces a bytestring:

>>> from io import BytesIO, TextIOWrapper
>>> import csv
>>> file_content = BytesIO()
>>> wrapper = TextIOWrapper(file_content, encoding='cp1252', line_buffering=True)
>>> csv_writer = csv.writer(wrapper, delimiter='\t', quotechar='"', quoting=csv.QUOTE_MINIMAL)
>>> csv_writer.writerow(['Does he bark?', 'Full     Name', 'Gender'])
36
>>> csv_writer.writerow(['Sometimes, yes', 'Woofy the dog', 'Male'])
35
>>> file_content.getvalue()
b'Does he bark?\tFull     Name\tGender\r\nSometimes, yes\tWoofy the dog\tMale\r\n'

edited Jul 29, 2016 at 11:04

answered Jul 29, 2016 at 10:55

Martijn Pieters

1.1m326 gold badges4.2k silver badges3.4k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Maxime Lorant Over a year ago

Looks like it works with the wrapper indeed (once flushed, but you made the edit before I has the time to comment). Tests passed so 99% sure it is the right answer :)

Martijn Pieters Over a year ago

@MaximeLorant: I've now switched it to using line-buffering; avoids having to manually flush. Sorry about that.

Maxime Lorant Over a year ago

Seems cleaner indeed! Thanks for the tip.

Collectives™ on Stack Overflow

Write non-Unicode using csv module

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related