Select rows of pandas dataframe from list, in order of list

Question

The question was originally asked here as a comment but could not get a proper answer as the question was marked as a duplicate.

For a given pandas.DataFrame, let us say

df = DataFrame({'A' : [5,6,3,4], 'B' : [1,2,3, 5]})
df

     A   B
0    5   1
1    6   2
2    3   3
3    4   5

How can we select rows from a list, based on values in a column ('A' for instance)

For instance

# from
list_of_values = [3,4,6]

# we would like, as a result
#      A   B
# 2    3   3
# 3    4   5
# 1    6   2

Using isin as mentioned here is not satisfactory as it does not keep order from the input list of 'A' values.

How can the abovementioned goal be achieved?

syltruong · Accepted Answer · 2018-08-24 03:48:16Z

13

One way to overcome this is to make the 'A' column an index and use loc on the newly generated pandas.DataFrame. Eventually, the subsampled dataframe's index can be reset.

Here is how:

ret = df.set_index('A').loc[list_of_values].reset_index(inplace=False)

# ret is
#      A   B
# 0    3   3
# 1    4   5
# 2    6   2

Note that the drawback of this method is that the original indexing has been lost in the process.

More on pandas indexing: What is the point of indexing in pandas?

edited Aug 24, 2018 at 3:48

answered Aug 21, 2018 at 7:50

syltruong

2,74323 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

jezrael Over a year ago

One question - all values of list_of_values are in column? Is possible list_of_values = [3,4,6,7,7,4] ?

syltruong Over a year ago

in practice no but its true that this solution has the inconvenience of not handling out of column values

jezrael Over a year ago

So the best is the most general solution with not remove original indices, working with duplicates values?

jezrael · Accepted Answer · 2018-08-24 06:29:17Z

6

Use merge with helper DataFrame created by list and with column name of matched column:

df = pd.DataFrame({'A' : [5,6,3,4], 'B' : [1,2,3,5]})

list_of_values = [3,6,4]
df1 = pd.DataFrame({'A':list_of_values}).merge(df)
print (df1)
   A  B
0  3  3
1  6  2
2  4  5

For more general solution:

df = pd.DataFrame({'A' : [5,6,5,3,4,4,6,5], 'B':range(8)})
print (df)
   A  B
0  5  0
1  6  1
2  5  2
3  3  3
4  4  4
5  4  5
6  6  6
7  5  7

list_of_values = [6,4,3,7,7,4]

#create df from list 
list_df = pd.DataFrame({'A':list_of_values})
print (list_df)
   A
0  6
1  4
2  3
3  7
4  7
5  4

#column for original index values
df1 = df.reset_index()
#helper column for count duplicates values
df1['g'] = df1.groupby('A').cumcount()
list_df['g'] = list_df.groupby('A').cumcount()

#merge together, create index from column and remove g column
df = list_df.merge(df1).set_index('index').rename_axis(None).drop('g', axis=1)
print (df)
   A  B
1  6  1
4  4  4
3  3  3
5  4  5

edited Aug 24, 2018 at 6:29

answered Aug 21, 2018 at 8:05

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

4 Comments

Zero Over a year ago

Original index is lost in the process.

jezrael Over a year ago

@Zero - Then is necessary df1 = pd.DataFrame({'A':list_of_values}).merge(df.reset_index()).set_index('index').rename_axis(None)

syltruong Over a year ago

Actually, I found out this approach does not work in the case list_of_values contains repeated values: the order is not guaranteed to be kept. Im sorry I had to unvote as answer

jezrael Over a year ago

@syltruong - I try create the more general solution for duplicates values (4) and also not matched values (7).

Zero · Accepted Answer · 2018-08-21 08:04:25Z

1

1] Generic approach for list_of_values.

In [936]: dff = df[df.A.isin(list_of_values)]

In [937]: dff.reindex(dff.A.map({x: i for i, x in enumerate(list_of_values)}).sort_values().index)
Out[937]:
   A  B
2  3  3
3  4  5
1  6  2

2] If list_of_values is sorted. You can use

In [926]: df[df.A.isin(list_of_values)].sort_values(by='A')
Out[926]:
   A  B
2  3  3
3  4  5
1  6  2

edited Aug 21, 2018 at 8:04

answered Aug 21, 2018 at 7:59

Zero

77.4k22 gold badges153 silver badges153 bronze badges

Collectives™ on Stack Overflow

Select rows of pandas dataframe from list, in order of list

3 Answers 3

3 Comments

4 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related