16

I'm trying to write a custom extraction method for babel, to extract strings from a specific column in a csv file. I followed the documentation here.

Here is my extraction method code:

def extract_csv(fileobj, keywords, comment_tags, options):
    import csv
    reader = csv.DictReader(fileobj, delimiter=',')
    for row in reader:
        if row and row['caption'] != '':
            yield (reader.line_num, '', row['caption'], '')

When i try to run the extraction i get this error:

File "/Users/tiagosilva/repos/naltio/csv_extractor.py", line 18, in extract_csv for row in reader: File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/csv.py", line 111, in next self.fieldnames File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/csv.py", line 98, in fieldnames self._fieldnames = next(self.reader) _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

It seems the fileobj that is passed to the function was opened in binary mode.

How to make this work? I can think of 2 possible solutions, but I don't know how to code them:

1) is there a way to use it with DictReader?

2) Is there a way to signal babel to open the file in text mode?

I'm open to other non listed solutions.

1 Answer 1

36

I actually found a way to do it!

It's solution 1, a way to handle a binary file. The solution is to wrap a TextIOWrapper around the binary file and decode it and pass that to the DictReader.

import csv
import io

with io.TextIOWrapper(fileobj, encoding='utf-8') as text_file:
    reader = csv.DictReader(text_file, delimiter=',')

    for row in reader:
        if row and 'caption' in row.keys():
            yield (reader.line_num, '', row['caption'], '')
Sign up to request clarification or add additional context in comments.

4 Comments

In case it helps anyone else: this approach also works great if you have a zip file containing one or more csv files and are using python 3.6+ zipfile (and possibly older) that only supports opening in binary mode
This compact solution solved the problem I'm facing, wherein an unknown file blob has already been opened as binary but needs to be handled as text if it's actually a CSV (and I can't change how it is originally ingested). Every other answer I've seen changes how you open it, rather than how you process it.
Thanks for this. This is a really neat solution to the problem. So far, all the other solutions I've seen ask me to load the entire content of the file in memory before passing it to the CSV reader.
Awesome, exactly what I needed. CSV was uploaded and Flask/Werkzeug passed it through as binary file making the csv.DictReader break. Your solution resolved this!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.