Downloading a file with a URL using python

Question

I want to download the file in the following url using python. I tried with the following code but it seems like not working. I think the error is in the file format. I would be glad if you can suggest the modifications to the code or a new code that I can use for this purpose

Link to the website

https://www.gov.uk/government/statistics/transport-use-during-the-coronavirus-covid-19-pandemic

URL required to be downloaded

https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/959864/COVID-19-transport-use-statistics.ods

My Code

from urllib import request


response = request.urlopen("https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/959864/COVID-19-transport-use-statistics.ods")
csv = response.read()


csvstr = str(csv).strip("b'")

lines = csvstr.split("\\n")
f = open("historical.csv", "w")
for line in lines:
   f.write(line + "\n")
f.close()

Here basically I only want to download the file. I have heard that Beautifulsoup can be used for that but I don't have much experience on this. Any code that would serve my purpose is highly appreciated

Thanks

"it seems like not working" - how exactly is it "not working"? — ForceBru
– ForceBru, Commented Feb 14, 2021 at 12:07
The data is encoded and it do not show the exact content in the csv file — python_coder_
– python_coder_, Commented Feb 14, 2021 at 12:11

watch-this · Accepted Answer · 2021-02-14 12:33:10Z

5

To download the file:

In [1]: import requests

In [2]: url = 'https://assets.publishing.service.gov.uk/government/uploads/syste
   ...: m/uploads/attachment_data/file/959864/COVID-19-transport-use-statistics.
   ...: ods'

In [3]: with open('COVID-19-transport-use-statistics.ods', 'wb') as out_file:
   ...:     content = requests.get(url, stream=True).content
   ...:     out_file.write(content)

And then you can use pandas-ods-reader to read the file by running:

pip install pandas-ods-reader

Then:

In [4]: from pandas_ods_reader import read_ods

In [5]: df = read_ods('COVID-19-transport-use-statistics.ods', 1)

In [6]: df
Out[6]: 
                   Department for Transport statistics  ...   unnamed.9
0    https://www.gov.uk/government/statistics/trans...  ...        None
1                                                 None  ...        None
2    Use of transport modes: Great Britain, since 1...  ...        None
3    Figures are percentages of an equivalent day o...  ...        None
4                                                 None  ...  Percentage
..                                                 ...  ...         ...
390                  Transport for London Tube and Bus  ...        None
391                               Buses (excl. London)  ...        None
392                                           Cycling   ...        None
393                                  Any other queries  ...        None
394                                    Media enquiries  ...        None

And you can save it as a csv if that is what you want using df.to_csv('my_data.csv', index=False)

answered Feb 14, 2021 at 12:33

watch-this

1

Sign up to request clarification or add additional context in comments.

6 Comments

python_coder_ Over a year ago

Thank you!! .So moving on without giving the URL of the .ods file, can we download this file from the URL of the website

watch-this Over a year ago

Yes this can be done by getting the target url from the website using xpath or BeautifulSoup and then do the exact same steps I mentioned.

python_coder_ Over a year ago

I tried bs4 and it works. Now I have a problem there when I read the ods file, as there is a header (first 6 rows) in the ods file I downloaded , dataframe is not properly interpreted. Can you please suggest a modification that can be done

watch-this Over a year ago

What do you mean by not properly interpreted?

python_coder_ Over a year ago

The columns should be Date, Cars, Light Commercial Vehicles...... But as the .ods file contains a header in first 6 rows, the columns created are different

|

gsb22 · Accepted Answer · 2021-02-14 12:25:14Z

1

I see that you are just trying to download the file that is .ods format and I think saving it in .csv wont convert it into a csv file.

Following code would help you download the file. I have used requests library which is a better option in place of urllib.

import requests

file_url = "https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/959864/COVID-19-transport-use-statistics.ods"


file_data = requests.get(file_url).content
# create the file in write binary mode, because the data we get from net is in binary
with open("historical.ods", "wb") as file:
    file.write(file_data)

Output file can be viewed in MS Excel.

answered Feb 14, 2021 at 12:25

gsb22

2,2002 gold badges12 silver badges28 bronze badges

4 Comments

python_coder_ Over a year ago

Thank you!! .So moving on without giving the URL of the .ods file, can we download this file from the URL of the website

gsb22 Over a year ago

yes, you would need beautifulsoup for that. Instead of giving the file url, provide the website url, it will return the html content and then using BS library, you can fetch the url of the file that would be encoded somewhere on the page and then rest of the code to download it.

python_coder_ Over a year ago

Can you please provide me the code for that . This is the URL <gov.uk/government/statistics/…> I want to download the above file only using this URL

python_coder_ Over a year ago

I tried bs4 and it works. Now I have a problem there when I read the ods file, as there is a header (first 6 rows) in the ods file I downloaded , dataframe is not properly interpreted. Can you please suggest a modification that can be done

Collectives™ on Stack Overflow

Downloading a file with a URL using python

2 Answers 2

6 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related