1

I'm reading temperature entries stored in a file. Each entry is generated when the temperature value changes, so it is not stored in regular intervals.

An example of the data could be as follows:

timestamp  | temperature
-----------+------------
1477400000 | 31
1477400001 | 31.5
1477400003 | 32
1477400010 | 31.5
1477400200 | 32
1477400201 | 32.5

I would need a fast way to get the temperature at any timestamp, even if it is not in the index. For instance, temperature at 1477400002 would be 31.5, but 1477400002 is not in the index.

For sake of easier reproducibility, the same dataframe may be generated as follows:

df = pd.DataFrame(data={'temperature': [31, 31.5, 32, 31.5, 32, 32.5]},
                  index=[1477400000, 1477400001, 1477400003, 1477400010, 1477400200, 1477400201])
4
  • 1
    Why should it return 31.5 rather than 32? Commented Oct 25, 2016 at 12:24
  • @kiril, Does the index values repeat itself as shown in your mcve? Commented Oct 25, 2016 at 12:43
  • It should return 31.5 because the value 32 was set before that timestamp Commented Oct 25, 2016 at 13:15
  • oh, no sorry, it was a mistake, I'll update it Commented Oct 25, 2016 at 13:15

2 Answers 2

2

Assuming that the index is sorted, you can use np.searchsorted to return the ordinal position and use iloc to index into the df:

In [84]:
df.iloc[max(0, np.searchsorted(df.index, 1477400002 ) -1)]

Out[84]:
temperature    31.5
Name: 1477400001, dtype: float64

Here I subtract 1 from the result of np.searchsorted to return the lower bound, additionally to protect against the situation where it returns the first entry I also calc the max between 0 and the returned value so if you tried to find 1477400000 then this will still return the first entry

Sign up to request clarification or add additional context in comments.

Comments

2

You can also use index.get_loc method and set it's arg nearest=pad to find the previous index value in case if no match is found. Then, use DF.get_value to retrieve the value at the index pointed by the aforementioned operation by accessing the name attribute and the column of interest, temperature as shown:

Demo:

df.get_value(df.iloc[df.index.get_loc(1477400002, method='pad')].name, 'temperature')
# 31.5

df.get_value(df.iloc[df.index.get_loc(1477400003, method='pad')].name, 'temperature')
# 32.0

It is assumed that the query would begin after the first index, as you would want the previous value at any given point in time.

Timings:

%timeit df.get_value(df.iloc[df.index.get_loc(1477400002, method='pad')].name, 'temperature')
1000 loops, best of 3: 164 µs per loop

7 Comments

Can you add timings?
Yeah, I've added it. But EdChum's answer is twice as fast compared to mine, although it gives you previous results if the query contains a match already whereas mine returns the same value.
Unfortunately it fail in this key print (df.get_value(df.iloc[df.index.get_loc(1477300999, method='pad')].name, 'temperature'))
Hmmm, I think get_value is not necessary - use (df.iloc[df.index.get_loc(1477400002, method='pad')])
Ok, I try similar solution +1
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.