50

I have a DataFrame and I would select only rows that contain index value into df1.index.

for Example:

In [96]: df
Out[96]:
   A  B  C  D
1  1  4  9  1
2  4  5  0  2
3  5  5  1  0
22 1  3  9  6

and these indexes

In[96]:df1.index
Out[96]:
Int64Index([  1,   3,   4,   5,   6,   7,  22,  28,  29,  32,], dtype='int64', length=253)

I would like this output:

In [96]: df
Out[96]:
   A  B  C  D
1  1  4  9  1
3  5  5  1  0
22 1  3  9  6
0

2 Answers 2

84

Use isin:

df = df[df.index.isin(df1.index)]

Or get all intersectioned indices and select by loc:

df = df.loc[df.index & df1.index]
df = df.loc[np.intersect1d(df.index, df1.index)]
df = df.loc[df.index.intersection(df1.index)]

print (df)
    A  B  C  D
1   1  4  9  1
3   5  5  1  0
22  1  3  9  6

EDIT:

I tried solution: df = df.loc[df1.index]. Do you think that this solution is correct?

Solution is incorrect:

df = df.loc[df1.index]
print (df)

      A    B    C    D
1   1.0  4.0  9.0  1.0
3   5.0  5.0  1.0  0.0
4   NaN  NaN  NaN  NaN
5   NaN  NaN  NaN  NaN
6   NaN  NaN  NaN  NaN
7   NaN  NaN  NaN  NaN
22  1.0  3.0  9.0  6.0
28  NaN  NaN  NaN  NaN
29  NaN  NaN  NaN  NaN
32  NaN  NaN  NaN  NaN
C:/Dropbox/work-joy/so/_t/t.py:23: FutureWarning: 
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  print (df)
Sign up to request clarification or add additional context in comments.

5 Comments

I tried solution: df = df.loc[df1.index]. Do you think that this solution is correct?
@giupardeb - I test it and add to answer. Please check it.
I want to add that df = df.loc[df1.index] doesn't work
@song0089 - Ya, so use df = df[df.index.isin(df1.index)]
Beware of this usage of isin() as it will not result in df and df1 always being in the same order.
12

Passing the index to the row indexer/slicer of .loc now works, you just need to make sure to specify the columns as well, i.e.:

df = df.loc[df1.index, :]  # works

and NOT

df = df.loc[df1.index] # won't work

IMO This is more neater/consistent with the expected usage of .loc

3 Comments

You're right, it seems that the devs have changed the implementation. However now both seem to raise the KeyError warning if you pass in a list of values that doesn't exist
This solution has the advantage of ensuring order of df and df1 are the same.
this only works if df1's entire index is contained in df's index; the accepted answer does not have that limitation

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.