Converting time string to minutes only in Python

Question

I have a dataset which contains several time features. These time features contain object data like so:

    12h 22min
    7 hours
    18 minutes
    27h 37min
    1h 35min
    2 hours
    NaN

As you can see, the time is represented in different formats and also contains NaN values. As part of my data preprocessing, I want to convert this object data to numeric form (strings to minutes).

I tried to implement a solution similar to the one here as such:

def parse_time(time):
    if not pd.isna(time):
        mins = 0
        fields = time.split()
        print(fields) #inserted this line to debug why output was 0
        for idx in range(0, len(fields)-1):
            if fields[idx+1] in ('min', 'mins', 'minutes'):
                mins += int(fields[idx])
            elif fields[idx+1] in ('h', 'hour', 'hours'):
                mins += int(fields[idx]) * 60

        return mins

But when testing this function out, I realised that this will only work for data separated by spaces, which is not the case for my data:

   In[20]: parse_time('10h 50min')
           ['1h']
   Out[21]: 0
   In[22]: parse_time('10 h 50 min')
           ['10h', '50min']
   Out[23]:0
   In[24]: parse_time('10 h 50 min')
           ['10', 'h', '50', 'min']
   Out[24]: 650

Can anyone advise me what to change in my code so that this works, or offer an alternative, simpler solution?

Thanks :)

Quang Hoang · Accepted Answer · 2020-06-24 13:16:50Z

3

You can just do a pd.to_datetime:

pd.to_timedelta(df[0].fillna('0 min')
                    .str.replace('NaN', '0 m')
               )

Output:

0   0 days 12:22:00
1   0 days 07:00:00
2   0 days 00:18:00
3   1 days 03:37:00
4   0 days 01:35:00
5   0 days 02:00:00
6   0 days 00:00:00
Name: 0, dtype: timedelta64[ns]

Update: To get the periods in minutes:

pd.to_timedelta(df[0].fillna('0 min')
                    .str.replace('NaN', '0 m')
               ) / pd.to_timedelta('1 m')

Output:

0     742.0
1     420.0
2      18.0
3    1657.0
4      95.0
5     120.0
6       0.0
Name: 0, dtype: float64

Update 2: If you want to keep the NaN values, you can pass errors='coerce':

pd.to_timedelta(df[0], errors='coerce') / pd.to_timedelta('1 m')

Output:

0     742.0
1     420.0
2      18.0
3    1657.0
4      95.0
5     120.0
6       NaN
Name: 0, dtype: float64

edited Jun 24, 2020 at 13:16

answered Jun 24, 2020 at 13:01

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

sums22 Over a year ago

I don't think this solves my problem. I want to convert a time string to minutes, how has this helped me do that?

sums22 Over a year ago

Is it better to use errors='ignore' rather than errors='coerce' here?

MrNobody33 · Accepted Answer · 2020-06-24 13:15:49Z

You could try to use re.findall with the time stripped, if you want to keep that function:

import re

def parse_time(time):
    if not pd.isna(time.strip()):
        mins = 0
        fields=re.findall(r'[A-Za-z]+|\d+', time.strip())
        print(fields) #inserted this line to debug why output was 0
        for idx in range(0, len(fields)-1):
            if fields[idx+1] in ('min', 'mins', 'minutes'):
                mins += int(fields[idx])
            elif fields[idx+1] in ('h', 'hour', 'hours'):
                mins += int(fields[idx]) * 60
    
        return mins

print(parse_time('20 hours 10min'))

print(parse_time('10 h 50 min'))

print(parse_time('10 h 50 min'))

Output:

['20', 'hours', '10', 'min']
1210
['10', 'h', '50', 'min']
650
['10', 'h', '50', 'min']
650

Collectives™ on Stack Overflow

Converting time string to minutes only in Python

2 Answers 2

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related