1

I have a number of timeseries .csv files that I am reading into a dataframe (df). I would like to create another data-frame that has the sum of all of these dataframes added together.

Examples of the dataframes are as follows. Example df 1:

         date BBG.XASX.ABP.S_price BBG.XASX.ABP.S_pos BBG.XASX.ABP.S_trade \ 
0  2017-09-11            2.8303586                0.0                  0.0   
1  2017-09-12            2.8135189                0.0                  0.0   
2  2017-09-13            2.7829274            86614.0              86614.0   
3  2017-09-14            2.7928042            86614.0                  0.0   
4  2017-09-15            2.8120383            86614.0                  0.0   

  BBG.XASX.ABP.S_cost BBG.XASX.ABP.S_pnl_pre_cost 
0                -0.0                         0.0   
1                -0.0                         0.0    
2    -32.540463966186                         0.0   
3                -0.0           855.4691551999713             
4                -0.0           1665.942337400047  

example df2:

        date BBG.XASX.AHG.S_price BBG.XASX.AHG.S_pos BBG.XASX.AHG.S_trade  \
0  2017-09-11            2.6068676                0.0                  0.0   
1  2017-09-12            2.6044785            76439.0              76439.0   
2  2017-09-13   2.6024171000000003            76439.0                  0.0   
3  2017-09-14            2.6139929            76439.0                  0.0   
4  2017-09-15            2.6602836            76439.0                  0.0   

   BBG.XASX.AHG.S_cost BBG.XASX.AHG.S_pnl_pre_cost 
0                 -0.0                         0.0   
1  -26.876303828302497                         0.0   
2                 -0.0          -157.5713545999606   
3                 -0.0           884.8425761999679   
4                 -0.0           3538.414817300014  

example df 3:

  date BBG.XASX.AGL.S_price BBG.XASX.AGL.S_pos BBG.XASX.AGL.S_trade  \
0  2017-09-18           18.8195983                0.0                  0.0   
1  2017-09-19           18.5104704            40613.0              40613.0   
2  2017-09-20           18.2010515            40613.0                  0.0   
3  2017-09-21           18.2217768            40613.0                  0.0   
4  2017-09-22            17.840112            40613.0                  0.0   

  BBG.XASX.AGL.S_cost BBG.XASX.AGL.S_pnl_pre_cost 
0                -0.0                         0.0                          
1   -101.488374137952                         0.0    
2                -0.0          -12566.42978570005   
3                -0.0           841.7166089001112    
4                -0.0         -15500.552522399928

Adding together the example dataframes the code would return the following output:

output:

date                 1       2      3              4               5               6
11/09/2017   5.4372262       0      0              0               0               0
12/09/2017   5.4179974   76439  76439              2    -26.87630383               0
13/09/2017   5.3853445  163053  86614              4    -32.54046397    -157.5713546
14/09/2017   5.4067971  163053      0              6               0     1740.311731
15/09/2017   5.4723219  163053      0              8               0     5204.357155
18/09/2017  18.8195983       0      0              0               0               0
19/09/2017  18.5104704   40613  40613   -101.4883741               0               0
20/09/2017  18.2010515   40613      0              0    -12566.42979               0
21/09/2017  18.2217768   40613      0              0     841.7166089               0
22/09/2017   17.840112   40613      0              0    -15500.55252               0

All of the data-frames have the same number of columns in the same order. Please note in the output the dates in the individual df's can be different and I would like to see the totals for the individual days.

The code that I am generating all the individual df data-frames is:

for subdirname in glob.iglob('C:/Users/stacey/WorkDocs/tradeopt/'+filename+'//BBG*/tradeopt.is-pnl*.lzma', recursive=True):
    df = pd.DataFrame(numpy.zeros((0,27)))

    out = []
    with lzma.open(subdirname, mode='rt') as file:
        print(subdirname)
        for line in file:
            items = line.split(",")
            out.append(items)
            if len(out) > 0:
                a = pd.DataFrame(out[1:], columns=out[0])    

How can I add together the individal df's into a sumdf?

1 Answer 1

2

Idea is convert column date to DatetimeIndex and split columns names by . to MultiIndex:

dfs = [] 
for subdirname in glob.iglob('C:/Users/stacey/WorkDocs/tradeopt/'+filename+'//BBG*/tradeopt.is-pnl*.lzma', recursive=True): 
    out = []
    with lzma.open(subdirname, mode='rt') as file:
        print(subdirname)
        for line in file:
            items = line.strip().split(",")
            out.append(items)
    if len(out) > 0:
        a = pd.DataFrame(out[1:], columns=out[0]).set_index('date')
        a.index = pd.to_datetime(a.index)  
        dfs.append(a)

And then use concat and sum by columns names:

df = pd.concat(dfs, axis=1).sum(level=0, axis=1)
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks jezrael, I'm getting the error 'ValueError: No axis named 1 for object type <class 'pandas.core.series.Series'>' associated with the line of code: df = pd.concat(dfs, axis=1).sum(level=[0,1,3], axis=1). Any idea what might be causing the issue?
thanks, no the data is not confidential, would you like me to post it somewhere?
Thanks @jezrael, I'll send the files on later on if that's ok?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.