5

I'm trying to stream response to csv.reader using requests.get(url, stream=True) to handle quite big data feeds. My code worked fine with python2.7. Here's code:

response = requests.get(url, stream=True)
ret = csv.reader(response.iter_lines(decode_unicode=True), delimiter=delimiter, quotechar=quotechar,
    dialect=csv.excel_tab)
for line in ret:
    line.get('name')

Unfortunately after migration to python3.6 I got an following error:

_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

I was trying to find some wrapper/decorator that would covert result of response.iter_lines() iterator from bytes to string, but no luck with that. I already tried to use io package and also codecs. Using codecs.iterdecode doesn't split data in lines, it's just split probably by chunk_size, and in this case csv.reader is complaining in following way:

_csv.Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?
3
  • iter_lines() is a generator so I can't decode() on that (?). On the other hand won't lambda expression used on generator fill my memory with all the data? I'd like to omit that, thats why I'm streaming it in the first place. Commented Sep 21, 2016 at 15:00
  • you're right, that won't work. You have to create a generator which calls the original one and decodes the bytes on-the-fly. Not sure how to do that. Commented Sep 21, 2016 at 15:05
  • @Jean-FrançoisFabre right, as I thought... I'll probably write some wrapping generator for that, but maybe someone will come up with a prettier solution. Commented Sep 21, 2016 at 15:09

1 Answer 1

9

I'm guessing you could wrap this in a genexp and feed decoded lines to it:

from contextlib import closing

with closing(requests.get(url, stream=True)) as r:
    f = (line.decode('utf-8') for line in r.iter_lines())
    reader = csv.reader(f, delimiter=',', quotechar='"')
    for row in reader:
        print(row)

Using some sample data in 3.5 this shuts up csv.reader, every line fed to it is first decoded in the genexp. Also, I'm using closing from contextlib as is generally suggested to automatically close the responce.

Sign up to request clarification or add additional context in comments.

1 Comment

Right, genexp is good enough. Thanks for the closing part, I wasn't aware of that.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.