10

I am new to pandas (ver 0.14.0) and have encountered the following problem:

I am trying to slice a pandas data frame utilizing a multiindex. The index contains a timestamp. If slicing using only a date for the timestamp it works fine. When slicing using a time in the timestamp it returns nothing or an exception.

What is the proper way to slice a timestamp which includes both the date and time?

UPDATE: What is the proper way to slice a timestamp and the other indices and columns?

Here is my code:

dates = pd.DatetimeIndex([datetime.datetime(2012,1,1,12,12,12)+datetime.timedelta(days = i) for i in range(6)])
freq = [1,2]
iterables = [dates, freq]

index = pd.MultiIndex.from_product(iterables, names=['date','frequency'])
df = pd.DataFrame(np.random.randn(6*2,4),index=index,columns=list('ABCD'))

print df.loc[(slice(None), slice(None)),:] # works
print df.loc[(slice(None), slice(1,1)),:] # works
df.loc[(slice('2012-01-01 12:12:12','2012-01-03 12:12:12'), slice(None)),:] # returns empty

Returns:

                                      A         B         C         D
date                frequency                                        
2012-01-01 12:12:12 1          0.903078 -0.250419  0.191373  0.491633
                    2         -2.571769  1.906471 -0.712225  0.255760
2012-01-02 12:12:12 1          1.056798 -0.753387  0.509417  2.001925
                    2         -0.746595  0.435158  0.955275 -1.854974
2012-01-03 12:12:12 1          0.139800 -0.728467 -1.196661  0.201817
                    2         -0.006282 -0.644041  0.138642 -1.232355
2012-01-04 12:12:12 1         -0.895909  0.504779 -0.803993  1.306559
                    2          0.268643 -0.642575 -0.573525  0.914382
2012-01-05 12:12:12 1          0.608634 -2.650082 -0.404462  0.593098
                    2         -0.376576 -1.514299 -1.053566  0.130654
2012-01-06 12:12:12 1          0.658660 -0.575514  0.665777 -1.282307
                    2          0.162896  0.302550  1.609635 -2.146004
                                      A         B         C         D
date                frequency                                        
2012-01-01 12:12:12 1          0.903078 -0.250419  0.191373  0.491633
2012-01-02 12:12:12 1          1.056798 -0.753387  0.509417  2.001925
2012-01-03 12:12:12 1          0.139800 -0.728467 -1.196661  0.201817
2012-01-04 12:12:12 1         -0.895909  0.504779 -0.803993  1.306559
2012-01-05 12:12:12 1          0.608634 -2.650082 -0.404462  0.593098
2012-01-06 12:12:12 1          0.658660 -0.575514  0.665777 -1.282307

Empty DataFrame
Columns: [A, B, C, D]
Index: []

Or if I try the following it returns an error:

df.loc[(slice(dates[0],dates[2]), slice(None)),:]

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-126-016ed3a2c8ff> in <module>()
----> 1 df.loc[(slice(dates[0],dates[2]), slice(None)),:]
      2 #print df.loc[(slice(pd.to_datetime(datetime.datetime(2013, 1, 2, 2, 3,     40)),pd.to_datetime(datetime.datetime(2013, 1, 3, 2, 3, 40))), 1),:]

C:\Anaconda\lib\site-packages\pandas\core\indexing.pyc in __getitem__(self, key)
   1125     def __getitem__(self, key):
   1126         if type(key) is tuple:
-> 1127             return self._getitem_tuple(key)
   1128         else:
   1129             return self._getitem_axis(key, axis=0)

C:\Anaconda\lib\site-packages\pandas\core\indexing.pyc in _getitem_tuple(self, tup)
    643     def _getitem_tuple(self, tup):
    644         try:
--> 645             return self._getitem_lowerdim(tup)
    646         except IndexingError:
    647             pass

C:\Anaconda\lib\site-packages\pandas\core\indexing.pyc in _getitem_lowerdim(self, tup)
    751         # we may have a nested tuples indexer here
    752         if self._is_nested_tuple_indexer(tup):
--> 753             return self._getitem_nested_tuple(tup)
    754 
    755         # we maybe be using a tuple to represent multiple dimensions here

C:\Anaconda\lib\site-packages\pandas\core\indexing.pyc in _getitem_nested_tuple(self,     tup)
    823 
    824             current_ndim = obj.ndim
--> 825             obj = getattr(obj, self.name)._getitem_axis(key, axis=axis,     validate_iterable=True)
    826             axis += 1
    827 

C:\Anaconda\lib\site-packages\pandas\core\indexing.pyc in _getitem_axis(self, key,     axis, validate_iterable)
   1254             return self._getitem_iterable(key, axis=axis)
   1255         elif _is_nested_tuple(key, labels):
-> 1256             locs = labels.get_locs(key)
   1257             indexer = [ slice(None) ] * self.ndim
   1258             indexer[axis] = locs

C:\Anaconda\lib\site-packages\pandas\core\index.pyc in get_locs(self, tup)
   3580                     np.logical_or,[ _convert_indexer(self._get_level_indexer(x,     level=i)
   3581                                                      ) for x in k ]))
-> 3582             elif k == slice(None):
   3583                 # include all from this level
   3584                 pass

C:\Anaconda\lib\site-packages\pandas\tslib.pyd in pandas.tslib._Timestamp.__richcmp__         (pandas\tslib.c:13056)()

TypeError: Cannot compare type 'Timestamp' with type 'NoneType'

This fails as well:

df.loc[(slice(pd.Timestamp('2012-01-01 12:12:12'),pd.Timestamp('2012-01-03 12:12:12')),slice(1,1)), slice('A','B')]

UPDATE The following works but still cannot be done in one step:

df_temp = df.loc[(slice(pd.Timestamp('2012-01-01 12:12:12'),pd.Timestamp('2012-01-03 12:12:12'))), slice('A','B')]
df_temp.loc[(slice(None),slice(1,1)),:]

                               A         B
date                frequency                    
2012-01-01 12:12:12 1          0.840330 -0.051184
2012-01-02 12:12:12 1         -0.468037 -0.012381
2012-01-03 12:12:12 1         -0.061229  0.613407

2 Answers 2

5

You can slice on the Timestamps rather than the strings:

In [11]: df.loc[(slice(pd.Timestamp('2012-01-01 12:12:12'),pd.Timestamp('2012-01-03 12:12:12')))]
Out[11]:
                                      A         B         C         D
date                frequency
2012-01-01 12:12:12 1          0.796501 -0.914335  1.205684  0.707926
                    2          0.659782 -0.823599  0.786772 -1.265034
2012-01-02 12:12:12 1          0.907892  1.248585 -0.037800 -0.893048
                    2         -0.595936 -0.286499  0.595300 -0.359440
2012-01-03 12:12:12 1          0.145403  0.621906  0.865768 -0.228813
                    2          1.169412  0.213809  0.551384  0.870852

In [12]: df.loc[(slice(pd.Timestamp('2012-01-01 12:12:12'),pd.Timestamp('2012-01-03 12:12:12')), slice(None))]
Out[12]:
                                      A         B         C         D
date                frequency
2012-01-01 12:12:12 1          0.796501 -0.914335  1.205684  0.707926
                    2          0.659782 -0.823599  0.786772 -1.265034
2012-01-02 12:12:12 1          0.907892  1.248585 -0.037800 -0.893048
                    2         -0.595936 -0.286499  0.595300 -0.359440
2012-01-03 12:12:12 1          0.145403  0.621906  0.865768 -0.228813
                    2          1.169412  0.213809  0.551384  0.870852

I think the fact strings work for slicing is pretty mad!


Saying that, I can't seem to get slicing on both with the following to work:

df.loc[(slice(pd.Timestamp('2012-01-01 12:12:12'),pd.Timestamp('2012-01-03 12:12:12')), slice(1, 1))]
KeyError: 'start bound [1] is not the [columns]'
Sign up to request clarification or add additional context in comments.

4 Comments

I have not tried select; will try tomorrow. Shouldn't slicing on all the indices and columns work? Still interested in being able to do so.
Their are some bugs in the multi-index slicing with datetime slicers, see here: github.com/pydata/pandas/issues/7429
Jeff, is it not more logical to do df.loc[idx['2012-01-01 12:12:12':'2012-01-03 12:12:12', 1], idx['A':'B']]? (so one idx with all selections for each axis). It's a detail, but maybe better to use it in a consistent style (and then you also don't need the extra brackets).
that works too; idx really just creates the correct tuples.
0

This is another example of a place where a Panel rather than a dataframe might have been appropriate.

pn = df.to_panel()

The resulting panel is a 4X6X2 panel that can be more naturally indexed into.

pn['A':'B',pd.Timestamp('2012-01-01 12:12:12'):pd.Timestamp('2012-01-03 12:12:12'):,1]
Out[24]:
                            A         B
date
2012-01-01 12:12:12  0.024273 -0.713160
2012-01-02 12:12:12 -1.075561  0.582569
2012-01-03 12:12:12  0.677187  0.973875

you could of course use the slice function.

1 Comment

Looks like Panel has been deprecated. "Warning In 0.20.0, Panel is deprecated and will be removed in a future version. See the section Deprecate Panel." pandas.pydata.org/pandas-docs/stable/dsintro.html#panel

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.