1

I read csv's from url and I would like to build a unique dataframe. The csv's corresponds to a timeseries of measurements for one parameter for a unique location (e.g each url is associated to a location and a unique parameter).

parameter = ['pm10','pm2.5','o3','no2']
location = [ 'Nabel_LUG', 'Nabel_MAG']

urls = []
dfs = []
CSV_URL = 'http://www.oasi.ti.ch/web/rest/measure/csv?domain=air&resolution=y&parameter={}&from=2007-01-01&to=2017-04-28&location={}'
for l in location:
    for p in parameter:
        url = CSV_URL.format( p, l)
        urls.append(url)

urls here is a list of url from which i get the csv's.

dfs = [(pd.read_csv(url, comment='#', sep=';', usecols=[0, 1], index_col='data')) for url in urls]
result_pm10 = pd.concat(dfs, keys=location)

result_pm10 is a dataframe that contains all the location's timeseries for a specific parameter e.g.:

            data                PM10

Nabel_LUG   01.07.2011 01:00    21.0
Nabel_LUG   01.07.2012 01:00    21.0
Nabel_LUG   01.07.2013 01:00    18.0
Nabel_LUG   01.07.2014 01:00    15.0
Nabel_LUG   01.07.2015 01:00    18.0
Nabel_LUG   01.07.2016 01:00    16.0
Nabel_LUG   01.07.2017 01:00    24.0
Nabel_MAG   01.07.2011 01:00    24.0
Nabel_MAG   01.07.2012 01:00    21.0
Nabel_MAG   01.07.2013 01:00    19.0
Nabel_MAG   01.07.2014 01:00    15.0
Nabel_MAG   01.07.2015 01:00    19.0
Nabel_MAG   01.07.2016 01:00    15.0
Nabel_MAG   01.07.2017 01:00    22.0

I would like to obtain something like this

            data                PM10   O3     NO2

Nabel_LUG   01.07.2011 01:00    21.0  683.0  34.0
Nabel_LUG   01.07.2012 01:00    21.0  668.0  32.0
Nabel_LUG   01.07.2013 01:00    18.0  707.0  31.0
Nabel_LUG   01.07.2014 01:00    15.0  366.0  29.0
Nabel_LUG   01.07.2015 01:00    18.0  804.0  30.0
Nabel_LUG   01.07.2016 01:00    16.0  550.0  28.0
Nabel_LUG   01.07.2017 01:00    24.0  45.0   37.0
Nabel_MAG   01.07.2011 01:00    24.0  540.0  20.0
Nabel_MAG   01.07.2012 01:00    21.0  432.0  19.0
Nabel_MAG   01.07.2013 01:00    19.0  494.0  18.0
Nabel_MAG   01.07.2014 01:00    15.0  259.0  20.0
Nabel_MAG   01.07.2015 01:00    19.0  596.0  18.0
Nabel_MAG   01.07.2016 01:00    15.0  363.0  21.0
Nabel_MAG   01.07.2017 01:00    22.0  65.0   24.0

But I'm only able to do this by repeating the above code for each parameter separately and then doing something like

df_parameter = [result_pm10, result_pm25, result_o3, result_no2]
result =  pd.concat(df_parameter, axis=1)

There is a way to do this in a more efficient way (especially when there are much more parameter)?

1
  • 1
    It seems like for each parameter you access a different URL, in which case I don't think there will be an easier/more efficient solution than the one you proposed. Commented May 4, 2017 at 5:14

1 Answer 1

1

There is problem you overwrite data. So you can use two lists in each loop for appending, also if need remove all columns where all NaNs add dropna, rename_axis is for set index names, which are converted after reset_index to columns names:

parameter = ['pm10','pm2.5','o3','no2']
location = [ 'Nabel_LUG', 'Nabel_MAG']

dfs = []
CSV_URL = 'http://www.oasi.ti.ch/web/rest/measure/csv?domain=air&resolution=y&parameter={}&from=2007-01-01&to=2017-04-28&location={}'
for l in location:
    dfs1 = []
    for p in parameter:
        url = CSV_URL.format( p, l)
        df = pd.read_csv(url, comment='#', sep=';', usecols=[0, 1], index_col='data')
        dfs1.append(df)
    dfs.append(pd.concat(dfs1, axis=1))

result_pm10 = pd.concat(dfs, keys=location)
                .rename_axis(('location','data'))
                .dropna(axis=1, how='all')
                .reset_index()
print (result_pm10)

     location              data  PM10     O3   NO2
0   Nabel_LUG  01.07.2007 01:00  27.0  804.0  35.0
1   Nabel_LUG  01.07.2008 01:00  25.0  540.0  34.0
2   Nabel_LUG  01.07.2009 01:00  22.0  651.0  32.0
3   Nabel_LUG  01.07.2010 01:00  21.0  652.0  33.0
4   Nabel_LUG  01.07.2011 01:00  21.0  683.0  34.0
5   Nabel_LUG  01.07.2012 01:00  21.0  668.0  32.0
6   Nabel_LUG  01.07.2013 01:00  18.0  707.0  31.0
7   Nabel_LUG  01.07.2014 01:00  15.0  366.0  29.0
8   Nabel_LUG  01.07.2015 01:00  18.0  804.0  30.0
9   Nabel_LUG  01.07.2016 01:00  16.0  550.0  28.0
10  Nabel_LUG  01.07.2017 01:00  24.0   45.0  37.0
11  Nabel_MAG  01.07.2007 01:00  26.0  607.0  22.0
12  Nabel_MAG  01.07.2008 01:00  23.0  416.0  22.0
13  Nabel_MAG  01.07.2009 01:00  21.0  433.0  21.0
14  Nabel_MAG  01.07.2010 01:00  19.0  527.0  21.0
15  Nabel_MAG  01.07.2011 01:00  24.0  540.0  21.0
16  Nabel_MAG  01.07.2012 01:00  21.0  432.0  20.0
17  Nabel_MAG  01.07.2013 01:00  19.0  494.0  19.0
18  Nabel_MAG  01.07.2014 01:00  15.0  259.0  18.0
19  Nabel_MAG  01.07.2015 01:00  19.0  596.0  20.0
20  Nabel_MAG  01.07.2016 01:00  15.0  363.0  18.0
21  Nabel_MAG  01.07.2017 01:00  22.0   65.0  24.0
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.