0

I m not sure if this has been answered before. But my requirement is that I have a dataframe like this:

df1:

         A  B
I1 I2

x11 x12  a11 b11
x12 x22  a21 b21

Note that this has multiindex of [I1, I2] and columns [A, B]

and then another dataframe like this:

df2:

    I1   I2
  0  x11  x12
  1  y11  y12

This has columns [I1, I2] which is the same as multiindex of df1.

Now what I would like to create is two dataframes like below:

df3 which has rows for which the index in df1 matches to that of column values in df2

A  B
a11 b11

df4 with the remaining i.e.

A  B
a21 b21

I know how to do this using iterrows() but it is not efficient. Looking for a vectorized solution. Thanks.

2 Answers 2

1

Let us try reset_index with merge

df3=df1.reset_index().merge(df2).set_index(['I1','I2'])
df4=df1.drop(df3.index)

Or

idx=pd.MultiIndex.from_frame(df2)
df3=df1.reindex(idx).dropna()
df4=df1.drop(df3.index)
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you, I m going to check on this. Also on another note, with the first approach I get a multiindex of type [I1, [I21, I22]] , how can I flatten this like [I1, I21], [I1, I22] ?
@check itertools product ?
0

Just to record another way of doing it, posting this:

I could set_index on df2 with [I1, I2] and then do a isin like:

is_index_there = df1.index.isin(df2.set_index([I1, I2]).index)

and then use that to create separate dfs like :

df3 = df1.loc[is_index_there == True] and

df4 = df2.loc[is_index_there == False]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.