How to merge and concatenate in pandas when reading csv from url

Question

I read csv's from url and I would like to build a unique dataframe. The csv's corresponds to a timeseries of measurements for one parameter for a unique location (e.g each url is associated to a location and a unique parameter).

parameter = ['pm10','pm2.5','o3','no2']
location = [ 'Nabel_LUG', 'Nabel_MAG']

urls = []
dfs = []
CSV_URL = 'http://www.oasi.ti.ch/web/rest/measure/csv?domain=air&resolution=y&parameter={}&from=2007-01-01&to=2017-04-28&location={}'
for l in location:
    for p in parameter:
        url = CSV_URL.format( p, l)
        urls.append(url)

urls here is a list of url from which i get the csv's.

dfs = [(pd.read_csv(url, comment='#', sep=';', usecols=[0, 1], index_col='data')) for url in urls]
result_pm10 = pd.concat(dfs, keys=location)

result_pm10 is a dataframe that contains all the location's timeseries for a specific parameter e.g.:

            data                PM10

Nabel_LUG   01.07.2011 01:00    21.0
Nabel_LUG   01.07.2012 01:00    21.0
Nabel_LUG   01.07.2013 01:00    18.0
Nabel_LUG   01.07.2014 01:00    15.0
Nabel_LUG   01.07.2015 01:00    18.0
Nabel_LUG   01.07.2016 01:00    16.0
Nabel_LUG   01.07.2017 01:00    24.0
Nabel_MAG   01.07.2011 01:00    24.0
Nabel_MAG   01.07.2012 01:00    21.0
Nabel_MAG   01.07.2013 01:00    19.0
Nabel_MAG   01.07.2014 01:00    15.0
Nabel_MAG   01.07.2015 01:00    19.0
Nabel_MAG   01.07.2016 01:00    15.0
Nabel_MAG   01.07.2017 01:00    22.0

I would like to obtain something like this

            data                PM10   O3     NO2

Nabel_LUG   01.07.2011 01:00    21.0  683.0  34.0
Nabel_LUG   01.07.2012 01:00    21.0  668.0  32.0
Nabel_LUG   01.07.2013 01:00    18.0  707.0  31.0
Nabel_LUG   01.07.2014 01:00    15.0  366.0  29.0
Nabel_LUG   01.07.2015 01:00    18.0  804.0  30.0
Nabel_LUG   01.07.2016 01:00    16.0  550.0  28.0
Nabel_LUG   01.07.2017 01:00    24.0  45.0   37.0
Nabel_MAG   01.07.2011 01:00    24.0  540.0  20.0
Nabel_MAG   01.07.2012 01:00    21.0  432.0  19.0
Nabel_MAG   01.07.2013 01:00    19.0  494.0  18.0
Nabel_MAG   01.07.2014 01:00    15.0  259.0  20.0
Nabel_MAG   01.07.2015 01:00    19.0  596.0  18.0
Nabel_MAG   01.07.2016 01:00    15.0  363.0  21.0
Nabel_MAG   01.07.2017 01:00    22.0  65.0   24.0

But I'm only able to do this by repeating the above code for each parameter separately and then doing something like

df_parameter = [result_pm10, result_pm25, result_o3, result_no2]
result =  pd.concat(df_parameter, axis=1)

There is a way to do this in a more efficient way (especially when there are much more parameter)?

It seems like for each parameter you access a different URL, in which case I don't think there will be an easier/more efficient solution than the one you proposed. — Robbie
– Robbie, Commented May 4, 2017 at 5:14

jezrael · Accepted Answer · 2017-05-04 06:42:16Z

There is problem you overwrite data. So you can use two lists in each loop for appending, also if need remove all columns where all NaNs add dropna, rename_axis is for set index names, which are converted after reset_index to columns names:

parameter = ['pm10','pm2.5','o3','no2']
location = [ 'Nabel_LUG', 'Nabel_MAG']

dfs = []
CSV_URL = 'http://www.oasi.ti.ch/web/rest/measure/csv?domain=air&resolution=y&parameter={}&from=2007-01-01&to=2017-04-28&location={}'

for l in location:
    dfs1 = []
    for p in parameter:
        url = CSV_URL.format( p, l)
        df = pd.read_csv(url, comment='#', sep=';', usecols=[0, 1], index_col='data')
        dfs1.append(df)
    dfs.append(pd.concat(dfs1, axis=1))

result_pm10 = pd.concat(dfs, keys=location)
                .rename_axis(('location','data'))
                .dropna(axis=1, how='all')
                .reset_index()
print (result_pm10)

     location              data  PM10     O3   NO2
0   Nabel_LUG  01.07.2007 01:00  27.0  804.0  35.0
1   Nabel_LUG  01.07.2008 01:00  25.0  540.0  34.0
2   Nabel_LUG  01.07.2009 01:00  22.0  651.0  32.0
3   Nabel_LUG  01.07.2010 01:00  21.0  652.0  33.0
4   Nabel_LUG  01.07.2011 01:00  21.0  683.0  34.0
5   Nabel_LUG  01.07.2012 01:00  21.0  668.0  32.0
6   Nabel_LUG  01.07.2013 01:00  18.0  707.0  31.0
7   Nabel_LUG  01.07.2014 01:00  15.0  366.0  29.0
8   Nabel_LUG  01.07.2015 01:00  18.0  804.0  30.0
9   Nabel_LUG  01.07.2016 01:00  16.0  550.0  28.0
10  Nabel_LUG  01.07.2017 01:00  24.0   45.0  37.0
11  Nabel_MAG  01.07.2007 01:00  26.0  607.0  22.0
12  Nabel_MAG  01.07.2008 01:00  23.0  416.0  22.0
13  Nabel_MAG  01.07.2009 01:00  21.0  433.0  21.0
14  Nabel_MAG  01.07.2010 01:00  19.0  527.0  21.0
15  Nabel_MAG  01.07.2011 01:00  24.0  540.0  21.0
16  Nabel_MAG  01.07.2012 01:00  21.0  432.0  20.0
17  Nabel_MAG  01.07.2013 01:00  19.0  494.0  19.0
18  Nabel_MAG  01.07.2014 01:00  15.0  259.0  18.0
19  Nabel_MAG  01.07.2015 01:00  19.0  596.0  20.0
20  Nabel_MAG  01.07.2016 01:00  15.0  363.0  18.0
21  Nabel_MAG  01.07.2017 01:00  22.0   65.0  24.0

Collectives™ on Stack Overflow

How to merge and concatenate in pandas when reading csv from url

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related