5

I am reading McKinney's Data Analysis book, and he has shared 150MB file. Although this topic has been discussed extensively at Progress Bar while download file over http with Requests, I am finding that the code in accepted answer is throwing an error. I am a beginner, so I am unable to resolve this.

I want to download the following file:

https://raw.githubusercontent.com/wesm/pydata-book/2nd-edition/datasets/fec/P00000001-ALL.csv

Here's the code without progress bar:

DATA_PATH='./Data'
filename = "P00000001-ALL.csv"
url_without_filename = "https://raw.githubusercontent.com/wesm/pydata-book/2nd-edition/datasets/fec"

url_with_filename = url_without_filename + "/" + filename
local_filename = DATA_PATH + '/' + filename

#Write the file on local disk
r = requests.get(url_with_filename)  #without streaming
with open(local_filename, 'w', encoding=r.encoding) as f:
    f.write(r.text)

This works well, but because there is no progress bar, I wonder what's going on.

Here's the code adapted from Progress Bar while download file over http with Requests and How to download large file in python with requests.py?

#Option 2:
#Write the file on local disk
r = requests.get(url_with_filename, stream=True)  # added stream parameter
total_size = int(r.headers.get('content-length', 0))

with open(local_filename, 'w', encoding=r.encoding) as f:
    #f.write(r.text)
    for chunk in tqdm(r.iter_content(1024), total=total_size, unit='B', unit_scale=True):
        if chunk:
            f.write(chunk)

There are two problems with the second option (i.e. with streaming and tqdm package):

a) The file size isn't calculated correctly. The actual size is 157MB, but the total_size turns out to be 25MB.

b) Even bigger problem than a) is that I get the following error:

 0%|          | 0.00/24.6M [00:00<?, ?B/s] Traceback (most recent call last):   File "C:\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3265, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)   File "<ipython-input-31-abbe9270092b>", line 6, in <module>
    f.write(data) TypeError: write() argument must be str, not bytes

As a beginner, I am unsure how to solve these two issues. I spent a lot of time going through git page of tqdm, but I couldn't follow it. I'd appreciate any help.


I am assuming that the readers know that we need to import requests and tqdm. So, I haven't included the code for importing these basic packages.


Here's the code for those who are curious:

with open(local_filename, 'wb') as f:
    r = requests.get(url_with_filename, stream=True)  # added stream parameter
    # total_size = int(r.headers.get('content-length', 0))
    local_filename = DATA_PATH + '/' + filename
    total_size = len(r.content)
    downloaded = 0
    # chunk_size = max(1024*1024,int(total_size/1000))
    chunk_size = 1024
    #for chunk in tqdm(r.iter_content(chunk_size=chunk_size),total=total_size,unit='KB',unit_scale=True):
    for chunk in r.iter_content(chunk_size=chunk_size):
        downloaded += len(chunk)
        a=f.write(chunk)
        done = int(50 * downloaded/ total_size)
        sys.stdout.write("\r[%s%s]" % ('=' * done, ' ' * (50 - done)))
        sys.stdout.flush()
1
  • Find a way to get the true size of the file and the progress bar is easy. Commented Oct 12, 2018 at 9:03

3 Answers 3

1

As the error says :

write() argument must be str, not bytes

so just convert chunk to string :

f.write(str(chunk))

Note: Instead I would suggest to write to a .bin file and then convert it to .csv

Sign up to request clarification or add additional context in comments.

Comments

0
with open(filename, 'wb', encoding=r.encoding) as f:
    f.write(r.content)

This should fix your writing problem. Write r.content not r.text Since type(r.content) is <class 'bytes'> which is what you need to write in the file

Comments

0

Try writing with wb instead of just w.

with open( local_filename, 'wb', encoding= r.encoding ) as f:
    f.write( r.text )

1 Comment

I just checked. Both issues persist. Also, the issue isn't about plainly downloading the file, but about using progress bar and calculating the size correctly.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.