Query dataframe using index value between index entries

Question

I'm reading temperature entries stored in a file. Each entry is generated when the temperature value changes, so it is not stored in regular intervals.

An example of the data could be as follows:

timestamp  | temperature
-----------+------------
1477400000 | 31
1477400001 | 31.5
1477400003 | 32
1477400010 | 31.5
1477400200 | 32
1477400201 | 32.5

I would need a fast way to get the temperature at any timestamp, even if it is not in the index. For instance, temperature at 1477400002 would be 31.5, but 1477400002 is not in the index.

For sake of easier reproducibility, the same dataframe may be generated as follows:

df = pd.DataFrame(data={'temperature': [31, 31.5, 32, 31.5, 32, 32.5]},
                  index=[1477400000, 1477400001, 1477400003, 1477400010, 1477400200, 1477400201])

@kiril, Does the index values repeat itself as shown in your mcve? — Nickil Maveli
– Nickil Maveli, Commented Oct 25, 2016 at 12:43
It should return 31.5 because the value 32 was set before that timestamp — kiril
– kiril, Commented Oct 25, 2016 at 13:15

EdChum · Accepted Answer · 2016-10-25 12:24:13Z

2

Assuming that the index is sorted, you can use np.searchsorted to return the ordinal position and use iloc to index into the df:

In [84]:
df.iloc[max(0, np.searchsorted(df.index, 1477400002 ) -1)]

Out[84]:
temperature    31.5
Name: 1477400001, dtype: float64

Here I subtract 1 from the result of np.searchsorted to return the lower bound, additionally to protect against the situation where it returns the first entry I also calc the max between 0 and the returned value so if you tried to find 1477400000 then this will still return the first entry

answered Oct 25, 2016 at 12:24

EdChum

397k204 gold badges836 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Nickil Maveli · Accepted Answer · 2016-10-25 13:55:08Z

2

You can also use index.get_loc method and set it's arg nearest=pad to find the previous index value in case if no match is found. Then, use DF.get_value to retrieve the value at the index pointed by the aforementioned operation by accessing the name attribute and the column of interest, temperature as shown:

Demo:

df.get_value(df.iloc[df.index.get_loc(1477400002, method='pad')].name, 'temperature')
# 31.5

df.get_value(df.iloc[df.index.get_loc(1477400003, method='pad')].name, 'temperature')
# 32.0

It is assumed that the query would begin after the first index, as you would want the previous value at any given point in time.

Timings:

%timeit df.get_value(df.iloc[df.index.get_loc(1477400002, method='pad')].name, 'temperature')
1000 loops, best of 3: 164 µs per loop

edited Oct 25, 2016 at 13:55

answered Oct 25, 2016 at 13:46

Nickil Maveli

29.8k10 gold badges86 silver badges88 bronze badges

7 Comments

jezrael Over a year ago

Can you add timings?

Nickil Maveli Over a year ago

Yeah, I've added it. But EdChum's answer is twice as fast compared to mine, although it gives you previous results if the query contains a match already whereas mine returns the same value.

jezrael Over a year ago

Unfortunately it fail in this key print (df.get_value(df.iloc[df.index.get_loc(1477300999, method='pad')].name, 'temperature'))

jezrael Over a year ago

Hmmm, I think get_value is not necessary - use (df.iloc[df.index.get_loc(1477400002, method='pad')])

jezrael Over a year ago

Ok, I try similar solution +1

|

Collectives™ on Stack Overflow

Query dataframe using index value between index entries

2 Answers 2

Comments

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related