0

I have a dataframe with MultiIndex columns. I want to filter the data using the columns of the dataset. When I try df.columns I get this information:

MultiIndex(levels=[['power'], ['active']],
           codes=[[0], [0]],
           names=['physical_quantity', 'type'])

A short description of the dataset is:

physical_quantity          power
type                      active
2011-04-18 09:22:13-04:00    6.0
2011-04-18 09:22:16-04:00    6.0
2011-04-18 09:22:20-04:00    6.0
2011-04-18 09:22:23-04:00    6.0
2011-04-18 09:22:26-04:00    6.0

The first thing I found is that although you see two columns there, the dataframe really says that it is a dataset of [529757 rows x 1 columns].

What I want to do is to filter the data selecting an interval of time, choosing the first column called physical_quantity type.

On the other hand, the data of that first column (physical_quantity type) it is unknown:

physical_quantity  type  
power              active    float32
dtype: object

Checking with df.index I managed to see this information about the dataframe:

DatetimeIndex(['2011-04-18 09:22:13-04:00', '2011-04-18 09:22:16-04:00',
               '2011-04-18 09:22:20-04:00', '2011-04-18 09:22:23-04:00',
               '2011-04-18 09:22:26-04:00', '2011-04-18 09:22:30-04:00',
               '2011-04-18 09:22:33-04:00', '2011-04-18 09:22:37-04:00',
               '2011-04-18 09:22:40-04:00', '2011-04-18 09:22:44-04:00',
               ...
               '2011-05-14 23:59:26-04:00', '2011-05-14 23:59:29-04:00',
               '2011-05-14 23:59:33-04:00', '2011-05-14 23:59:36-04:00',
               '2011-05-14 23:59:40-04:00', '2011-05-14 23:59:43-04:00',
               '2011-05-14 23:59:46-04:00', '2011-05-14 23:59:50-04:00',
               '2011-05-14 23:59:53-04:00', '2011-05-14 23:59:57-04:00'],
              dtype='datetime64[ns, US/Eastern]', length=529757, freq=None)

So I understood that the data of that column is something like dtype='datetime64[ns, US/Eastern]

So I aim to slice the data, from an specific day and hour to another day and hour.

from 2011-05-10 19:44:51-04:00 to 2011-05-10 23:17:59-04:00

I tried doing something like this:

df[df['physical_quantity', 'type']] > 2011-05-10 19:44:51-04:00 
& 
df[df['physical_quantity', 'type']] < 2011-05-10 23:17:59-04:00
df[df['physical_quantity', 'type']] > 2011-05-10 19:44:51-04:00

File "<ipython-input-133-27848c7d6afc>", line 1
    df[df['physical_quantity', 'type']] > 2011-05-10 19:44:51-04:00
                                                ^
SyntaxError: invalid token

How can I solve my problem?

1
  • By itself, 2011-05-10 19:44:51-04:00 is not a valid object for Python, and the interpreter does not understand how to deal with this. Is it an int, a float, a string? You should be comparing compatible datatypes, in this case two np.datetime64 or datetime.datetime objects. Take a look at these two cases, in case they help. Commented Jul 3, 2019 at 14:28

1 Answer 1

1

try this

df['ts'] = pd.to_datetime(df["ts"], unit="ms")```
Sign up to request clarification or add additional context in comments.

1 Comment

Although this code snippet may answer the question, including an explanation of why and how it helps solve the problem improves the quality and longevity of your answer. See "How do I write a good answer?".

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.