I am a newbie to python and I have a very simple question. I have a list of indices which correspond to the some row indices of the data frame. What is the best way to use this list (in the order of the items) to subset a data frame?
2 Answers
Use iloc:
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame(np.random.randint(100, 200, (10, 2)), columns=['a', 'b'])
print(df, end='\n\n')
print(df.iloc[[7, 2, 3, 1, 6]])
Output:
a b
0 144 147
1 164 167
2 167 109
3 183 121
4 136 187
5 170 188
6 188 112
7 158 165
8 139 187
9 146 188
a b
7 158 165
2 167 109
3 183 121
1 164 167
6 188 112
If you want to use a list that corresponds to values in a column instead, then we need to merge:
values = [158, 167, 183, 164, 188]
print(pd.merge(pd.DataFrame([158, 167, 183, 164, 188], columns=['a']), df, on='a', how='left'))
Output:
a b
0 158 165
1 167 109
2 183 121
3 164 167
4 188 112
3 Comments
user5054
thank you, very helpful! how can I do a similar subsetting where the list I have corresponds to the elements in the first column instead of the data frame indices?
gmds
@user5054 That would require a fairly different method. Is that in fact your problem?
user5054
Yes, this is my actual problem @gmds . I got the indices of the first column elements using
indices = [data_fr[colname] == g for g in listitems] thinking I could use what you suggested after this step, and then I realized these were all logicals (and weirdly, all False), not indices. I think it would be more efficient to directly index based on the elements in column colname.