4

I want to download the file in the following url using python. I tried with the following code but it seems like not working. I think the error is in the file format. I would be glad if you can suggest the modifications to the code or a new code that I can use for this purpose

Link to the website

https://www.gov.uk/government/statistics/transport-use-during-the-coronavirus-covid-19-pandemic

URL required to be downloaded

https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/959864/COVID-19-transport-use-statistics.ods

My Code

from urllib import request


response = request.urlopen("https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/959864/COVID-19-transport-use-statistics.ods")
csv = response.read()


csvstr = str(csv).strip("b'")

lines = csvstr.split("\\n")
f = open("historical.csv", "w")
for line in lines:
   f.write(line + "\n")
f.close()

Here basically I only want to download the file. I have heard that Beautifulsoup can be used for that but I don't have much experience on this. Any code that would serve my purpose is highly appreciated

Thanks

2
  • "it seems like not working" - how exactly is it "not working"? Commented Feb 14, 2021 at 12:07
  • The data is encoded and it do not show the exact content in the csv file Commented Feb 14, 2021 at 12:11

2 Answers 2

5

To download the file:

In [1]: import requests

In [2]: url = 'https://assets.publishing.service.gov.uk/government/uploads/syste
   ...: m/uploads/attachment_data/file/959864/COVID-19-transport-use-statistics.
   ...: ods'

In [3]: with open('COVID-19-transport-use-statistics.ods', 'wb') as out_file:
   ...:     content = requests.get(url, stream=True).content
   ...:     out_file.write(content)

And then you can use pandas-ods-reader to read the file by running:

pip install pandas-ods-reader

Then:

In [4]: from pandas_ods_reader import read_ods

In [5]: df = read_ods('COVID-19-transport-use-statistics.ods', 1)

In [6]: df
Out[6]: 
                   Department for Transport statistics  ...   unnamed.9
0    https://www.gov.uk/government/statistics/trans...  ...        None
1                                                 None  ...        None
2    Use of transport modes: Great Britain, since 1...  ...        None
3    Figures are percentages of an equivalent day o...  ...        None
4                                                 None  ...  Percentage
..                                                 ...  ...         ...
390                  Transport for London Tube and Bus  ...        None
391                               Buses (excl. London)  ...        None
392                                           Cycling   ...        None
393                                  Any other queries  ...        None
394                                    Media enquiries  ...        None

And you can save it as a csv if that is what you want using df.to_csv('my_data.csv', index=False)

Sign up to request clarification or add additional context in comments.

6 Comments

Thank you!! .So moving on without giving the URL of the .ods file, can we download this file from the URL of the website
Yes this can be done by getting the target url from the website using xpath or BeautifulSoup and then do the exact same steps I mentioned.
I tried bs4 and it works. Now I have a problem there when I read the ods file, as there is a header (first 6 rows) in the ods file I downloaded , dataframe is not properly interpreted. Can you please suggest a modification that can be done
What do you mean by not properly interpreted?
The columns should be Date, Cars, Light Commercial Vehicles...... But as the .ods file contains a header in first 6 rows, the columns created are different
|
1

I see that you are just trying to download the file that is .ods format and I think saving it in .csv wont convert it into a csv file.

Following code would help you download the file. I have used requests library which is a better option in place of urllib.

import requests

file_url = "https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/959864/COVID-19-transport-use-statistics.ods"


file_data = requests.get(file_url).content
# create the file in write binary mode, because the data we get from net is in binary
with open("historical.ods", "wb") as file:
    file.write(file_data)

Output file can be viewed in MS Excel.

enter image description here

4 Comments

Thank you!! .So moving on without giving the URL of the .ods file, can we download this file from the URL of the website
yes, you would need beautifulsoup for that. Instead of giving the file url, provide the website url, it will return the html content and then using BS library, you can fetch the url of the file that would be encoded somewhere on the page and then rest of the code to download it.
Can you please provide me the code for that . This is the URL <gov.uk/government/statistics/…> I want to download the above file only using this URL
I tried bs4 and it works. Now I have a problem there when I read the ods file, as there is a header (first 6 rows) in the ods file I downloaded , dataframe is not properly interpreted. Can you please suggest a modification that can be done

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.