csv.reader read from Requests stream: iterator should return strings, not bytes

Question

I'm trying to stream response to csv.reader using requests.get(url, stream=True) to handle quite big data feeds. My code worked fine with python2.7. Here's code:

response = requests.get(url, stream=True)
ret = csv.reader(response.iter_lines(decode_unicode=True), delimiter=delimiter, quotechar=quotechar,
    dialect=csv.excel_tab)
for line in ret:
    line.get('name')

Unfortunately after migration to python3.6 I got an following error:

_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

I was trying to find some wrapper/decorator that would covert result of response.iter_lines() iterator from bytes to string, but no luck with that. I already tried to use io package and also codecs. Using codecs.iterdecode doesn't split data in lines, it's just split probably by chunk_size, and in this case csv.reader is complaining in following way:

_csv.Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?

iter_lines() is a generator so I can't decode() on that (?). On the other hand won't lambda expression used on generator fill my memory with all the data? I'd like to omit that, thats why I'm streaming it in the first place. — mdargacz
– mdargacz, Commented Sep 21, 2016 at 15:00
you're right, that won't work. You have to create a generator which calls the original one and decodes the bytes on-the-fly. Not sure how to do that. — Jean-François Fabre
– Jean-François Fabre ♦, Commented Sep 21, 2016 at 15:05
@Jean-FrançoisFabre right, as I thought... I'll probably write some wrapping generator for that, but maybe someone will come up with a prettier solution. — mdargacz
– mdargacz, Commented Sep 21, 2016 at 15:09

Dimitris Fasarakis Hilliard · Accepted Answer · 2016-09-21 15:44:34Z

9

I'm guessing you could wrap this in a genexp and feed decoded lines to it:

from contextlib import closing

with closing(requests.get(url, stream=True)) as r:
    f = (line.decode('utf-8') for line in r.iter_lines())
    reader = csv.reader(f, delimiter=',', quotechar='"')
    for row in reader:
        print(row)

Using some sample data in 3.5 this shuts up csv.reader, every line fed to it is first decoded in the genexp. Also, I'm using closing from contextlib as is generally suggested to automatically close the responce.

edited Sep 21, 2016 at 15:44

answered Sep 21, 2016 at 15:38

Dimitris Fasarakis Hilliard

162k35 gold badges282 silver badges265 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

mdargacz Over a year ago

Right, genexp is good enough. Thanks for the closing part, I wasn't aware of that.

Collectives™ on Stack Overflow

csv.reader read from Requests stream: iterator should return strings, not bytes

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related