3

How can I get a range of underlying indices number from DataFrame with DateTimeIndex? Some rows are removed, so the values may not be sequential.

For example:

dates = pd.date_range('1/1/2000', periods=8)
df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D'])
df
Out[3]: 
                   A         B         C         D
2000-01-01  0.469112 -0.282863 -1.509059 -1.135632
2000-01-02  1.212112 -0.173215  0.119209 -1.044236
2000-01-03 -0.861849 -2.104569 -0.494929  1.071804
2000-01-04  0.721555 -0.706771 -1.039575  0.271860
2000-01-05 -0.424972  0.567020  0.276232 -1.087401
2000-01-06 -0.673690  0.113648 -1.478427  0.524988
2000-01-07  0.404705  0.577046 -1.715002 -1.039268
2000-01-08 -0.370647 -1.157892 -1.344312  0.844885

For df.index I would like instead of

DatetimeIndex(['2001-01-01', '2001-01-02', '2001-01-03' ...], dtype='datetime64[ns]', freq=None)

to get

[1, 2, 4, ...,26, etc]

Is this possible in a wat that does not include df.reset_index()?

3
  • are you thinking of using the date part of the index (01, 25, ..) and the max integer may not be greater than 31? Commented Jan 17, 2020 at 22:53
  • No. I am thinking of using the underlying integer index for DataFrame. Commented Jan 17, 2020 at 22:56
  • There is no underlying integer index. If you drop the DatetimeIndex, Pandas will default to a RangeIndex set to the number of rows in your DataFrame. Commented Jan 17, 2020 at 22:58

4 Answers 4

5

To get this:

[1, 2, 4, ...,26, etc]

without using df.reset_index() (ie; leaving the DatetimeIndex as is), why don't you iterate on the Range of the Length of the index itself:

range(df.index.shape[0])

and to get a list directly you may use List Comprehension:

[i for i in range(df.index.shape[0])]

You might also want to check df.index.get_loc(), which returns the integer location of a specific requested index label.

unique_index = pd.Index(list('abc'))
unique_index.get_loc('b')

>>> 1

All of this is assuming you will be using this list for something other than direct DataFrame indexing of course. (Because obviously this won't work for that!)

Sign up to request clarification or add additional context in comments.

Comments

1

You can get integer index from datetimeindex by

import numpy as np
np.where(df.index.isin(df.index))

Output:

(array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16],
       dtype=int64),)

Comments

1

df.index.get_loc(idx) is the winner

idx = '2016-08-03 16:00:00'
print(df.shape)
%timeit df.index.get_loc(idx)
print(df.index.get_loc(idx))
%timeit np.where(df.index.isin([idx]))
print(np.where(df.index.isin([idx])))

output:

    (121275, 1)
187 µs ± 1.94 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
103770
885 µs ± 11.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
(array([103770]),)

Comments

0

I didn't profile it, so I'm not sure whether it's performant, but if you need to to convert some index to its iloc int version, you can use this

index_subset = DatetimeIndex(['2001-01-01', '2001-01-02', '2001-01-03'], freq=None)
numeric_indexes = df.index.get_indexer(index_subset)

index_subset doesn't need to be slice, but it can be any subset.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.