3

I'm trying to understand why I get this error. I already have a solution for this issue and it was actually solved here, just need to understand why it doesn't work as I was expecting.

I would like to understand why this throws a KeyError:

dates = pd.date_range('20130101', periods=4)
df = pd.DataFrame(np.identity(4), index=dates, columns=list('ABCD'))
df.loc[['20130102', '20130103'],:]

with the following feedback:

KeyError: "None of [['20130102', '20130103']] are in the [index]"

As explained here, the solution is just to do:

df.loc[pd.to_datetime(['20130102','20130104']),:]

So the problem is definitely with the way loc takes the string list as argument for selecting from a DateTimeIndex. However, I can see that the following calls are ok for this function:

df.loc['20130102':'20130104',:]

and

df.loc['20130102']

I would like to understand how this works and would appreciate any resources I can use to predict the behavior of this function depending of how it is being called. I read Indexing and Selecting Data and Time Series/Date functionality from pandas documentation but couldn't find an explanation for this.

2
  • 1
    If anyone has a similar issue, what solved it for me is removing the duplicate indices: ` df = df.loc[~df.index.duplicated(keep='first')]; sliced_df = df[start_time:end_time] ` Commented Aug 20, 2020 at 12:10
  • As well as sort the index: df = df.sort_index() Commented Aug 20, 2020 at 13:48

1 Answer 1

2

Typically, when you pass an array like object to loc, Pandas is going to try to locate each element of that array in the index. If it doesn't find it, you'll get a KeyError. And! you passed an array of strings when the values in the index are Timestamps... so those strings definitely aren't in the index.

However, Pandas also tries to make things easier for you. In particular, with a DatetimeIndex, If you were to pass a string scalar

df.loc['20130102']

A    0.0
B    1.0
C    0.0
D    0.0
Name: 2013-01-02 00:00:00, dtype: float64

Pandas will attempt to parse that scalar as a Timestamp and see if that value is in the index.

If you were to pass a slice object

df.loc['20130102':'20130104']

              A    B    C    D
2013-01-02  0.0  1.0  0.0  0.0
2013-01-03  0.0  0.0  1.0  0.0
2013-01-04  0.0  0.0  0.0  1.0

Pandas will also attempt to parse the bits of the slice object as Timestamp and return an appropriately sliced dataframe.

Your KeyError is simply passed the limits of how much helpfulness the Pandas Devs had time to code.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.