0

I have two pandas dataframes with the same column names, and same (two) multi-index columns ('symbol' and 'date') but some of the indices are different and some of the data is different.

df1 index columns: ['symbol','date']
symbol  date     o    c
aa      2015/1/1 1    1
aa      2015/2/1 2    2
bb      2015/1/1 71   71

df2 index columns: ['symbol','date']

symbol  date     o    c
aa      2015/1/1 1    1
bb      2016/2/1 2    2
bb      2015/1/1 51   55

I first want to create a shared dataframe with only the lines in both (an innerjoin with the default dataframe.merge() and keep only the index columns. Is there any way to do that in one shot?

Currently I do it the hard way:

merged = df1.merge(df2)  
analyzed = merged[['symbol','date']].copy

Now comes the question:

I want to have the analyzed dataframe with the following columns: o1 c1 o2 c2 with the data from the two files. how do I ".loc" to get the data by index?

I want something like the following code, but that gives an exception:

analyzed['o1'] = analyzed.apply(lambda row: df1['o'].loc[[row.symbol, row.date]]
# or maybe like this: 
analyzed['o1'] = analyzed.apply(lambda row: df1.at[ [row['symbol'], row['date']], 'o'] )

How do I do this? Or (in another way to ask) how do I get the current row's symbol and date values, and how do I use them to set the corresponding row in df1 (or df2)? Perhaps something with get_level_values? If so, how?

1 Answer 1

1

IIUC:

Using join:

analyze = df1.join(df2,lsuffix='_l')

Output:

                 o_l  c_l     o     c
symbol date                          
aa     2015/1/1    1    1   1.0   1.0
       2015/2/1    2    2   NaN   NaN
bb     2015/1/1   71   71  51.0  55.0

Selection:

analyze.loc[('aa','2015/1/1')]

o_l    1.0
c_l    1.0
o      1.0
c      1.0
Name: (aa, 2015/1/1), dtype: float64

Using merge with indexes:

analyze1 = df1.merge(df2, left_index=True, right_index=True)

Output:

                 o_x  c_x  o_y  c_y
symbol date                        
aa     2015/1/1    1    1    1    1
bb     2015/1/1   71   71   51   55

Selection:

analyze1.loc[('aa','2015/1/1')]

o_x    1
c_x    1
o_y    1
c_y    1
Name: (aa, 2015/1/1), dtype: int64
Sign up to request clarification or add additional context in comments.

5 Comments

Wow thanks. So I accomplish the analyzed. Now I can do the lambdas with the data in the analyzed row, never searching for info in the other dataframe. From your code I also understood that if I would (wrongly) wish to use a lambda I would get to the df1 column of that index with: analyzed['o1'] = analyzed.apply( lambda row: df1['o'].loc[(row['symbol'], row['date'])]) but that doesn't work. (Syntax error). How do I get the certain column from the .loc[(indexcol1, indexcol2)]
I also tried analyzed['o1'] = analyzed.apply( lambda row: df1.loc[(row['symbol'], row['date']), 'open']) according to these docs shanelynn.ie/… but I get an error
@pashute You don't need apply and lambdas. You just need to use .loc[rowindex, colindex]. Since your rowindex is multiindex, it looks a bit strange analyze.loc[('aa','2015/1/1'),['o_x','o_y']), the first argument is a tuple for the multiindex, the second argument is list for two columns.
Thanks @Scott_Boston ! That explains it. The first a tuple, the second a list. And I understood already that once I have the join, there is no need for apply nor lambda. I can set analyzed['o_indicator'] = analyzed.['o_x'] / analyzed.['o_y'] - 1
BUT..... IF I do wish to use an apply for some reason, I wanted to know how. So now I know how. Instead of the example code from my previous comment, I would use: analyzed['o_new'] = analyzed.apply(lambda row: df1.loc[(row['symbol'], row['datetime']),['o']].value so df1.loc would receive a tuple (with values from the row), and then a list with a single column's name, to retrieve the single value of that row... A bad and useless example, but shows how to use the df1.loc inside apply with lambda that depends on a different dataset.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.