Python: which is a fast way to find index in pandas dataframe?

Question

I have a dataframe like the following

df = 
    a   ID1         ID2         Proximity
0   0   900000498   NaN         0.000000
1   1   900000498   900004585   3.900000
2   2   900000498   900005562   3.900000
3   3   900000498   900008613   0.000000
4   4   900000498   900012333   0.000000
5   5   900000498   900019524   3.900000
6   6   900000498   900019877   0.000000
7   7   900000498   900020141   3.900000
8   8   900000498   900022133   3.900000
9   9   900000498   900022919   0.000000

I want to find for a given couple ID1-ID2 the corresponding Proximity value. For instance given the input [900000498, 900022133] I want as output 3.900000

EdChum · Accepted Answer · 2016-01-30 23:06:14Z

13

If this is a common operation then I'd set the index to those columns and then you can perform the index lookup using loc and pass a tuple of the col values:

In [60]:
df1 = df.set_index(['ID1','ID2'])

In [61]:
%timeit df1.loc[(900000498,900022133), 'Proximity']
%timeit df.loc[(df['ID1']==900000498)&(df['ID2']==900022133), 'Proximity']
1000 loops, best of 3: 565 µs per loop
100 loops, best of 3: 1.69 ms per loop

You can see that once the cols form the index then lookup is 3x faster than a filter operation.

The output is pretty much the same:

In [63]:
print(df1.loc[(900000498,900022133), 'Proximity'])
print(df.loc[(df['ID1']==900000498)&(df['ID2']==900022133), 'Proximity'])

3.9
8    3.9
Name: Proximity, dtype: float64

edited Jan 30, 2016 at 23:06

answered Jan 30, 2016 at 22:58

EdChum

397k204 gold badges836 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python: which is a fast way to find index in pandas dataframe?

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related