Select certain rows by index of another DataFrame

Question

I have a DataFrame and I would select only rows that contain index value into df1.index.

for Example:

In [96]: df
Out[96]:
   A  B  C  D
1  1  4  9  1
2  4  5  0  2
3  5  5  1  0
22 1  3  9  6

and these indexes

In[96]:df1.index
Out[96]:
Int64Index([  1,   3,   4,   5,   6,   7,  22,  28,  29,  32,], dtype='int64', length=253)

I would like this output:

In [96]: df
Out[96]:
   A  B  C  D
1  1  4  9  1
3  5  5  1  0
22 1  3  9  6

jezrael · Accepted Answer · 2018-02-19 11:50:00Z

84

Use isin:

df = df[df.index.isin(df1.index)]

Or get all intersectioned indices and select by loc:

df = df.loc[df.index & df1.index]
df = df.loc[np.intersect1d(df.index, df1.index)]
df = df.loc[df.index.intersection(df1.index)]

print (df)
    A  B  C  D
1   1  4  9  1
3   5  5  1  0
22  1  3  9  6

EDIT:

I tried solution: df = df.loc[df1.index]. Do you think that this solution is correct?

Solution is incorrect:

df = df.loc[df1.index]
print (df)

      A    B    C    D
1   1.0  4.0  9.0  1.0
3   5.0  5.0  1.0  0.0
4   NaN  NaN  NaN  NaN
5   NaN  NaN  NaN  NaN
6   NaN  NaN  NaN  NaN
7   NaN  NaN  NaN  NaN
22  1.0  3.0  9.0  6.0
28  NaN  NaN  NaN  NaN
29  NaN  NaN  NaN  NaN
32  NaN  NaN  NaN  NaN
C:/Dropbox/work-joy/so/_t/t.py:23: FutureWarning: 
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  print (df)

edited Feb 19, 2018 at 11:50

answered Feb 19, 2018 at 11:16

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

giupardeb Over a year ago

I tried solution: df = df.loc[df1.index]. Do you think that this solution is correct?

jezrael Over a year ago

@giupardeb - I test it and add to answer. Please check it.

Asteroid098 Over a year ago

I want to add that df = df.loc[df1.index] doesn't work

jezrael Over a year ago

@song0089 - Ya, so use df = df[df.index.isin(df1.index)]

DocOc Over a year ago

Beware of this usage of isin() as it will not result in df and df1 always being in the same order.

Hansang · Accepted Answer · 2019-07-02 06:13:34Z

12

Passing the index to the row indexer/slicer of .loc now works, you just need to make sure to specify the columns as well, i.e.:

df = df.loc[df1.index, :]  # works

and NOT

df = df.loc[df1.index] # won't work

IMO This is more neater/consistent with the expected usage of .loc

answered Jul 2, 2019 at 6:13

Hansang

1,63221 silver badges33 bronze badges

3 Comments

Hansang Over a year ago

You're right, it seems that the devs have changed the implementation. However now both seem to raise the KeyError warning if you pass in a list of values that doesn't exist

DocOc Over a year ago

This solution has the advantage of ensuring order of df and df1 are the same.

fantabolous Over a year ago

this only works if df1's entire index is contained in df's index; the accepted answer does not have that limitation

Collectives™ on Stack Overflow

Select certain rows by index of another DataFrame

2 Answers 2

5 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related