I have a dataset which contains several time features. These time features contain object data like so:
12h 22min
7 hours
18 minutes
27h 37min
1h 35min
2 hours
NaN
As you can see, the time is represented in different formats and also contains NaN values. As part of my data preprocessing, I want to convert this object data to numeric form (strings to minutes).
I tried to implement a solution similar to the one here as such:
def parse_time(time):
if not pd.isna(time):
mins = 0
fields = time.split()
print(fields) #inserted this line to debug why output was 0
for idx in range(0, len(fields)-1):
if fields[idx+1] in ('min', 'mins', 'minutes'):
mins += int(fields[idx])
elif fields[idx+1] in ('h', 'hour', 'hours'):
mins += int(fields[idx]) * 60
return mins
But when testing this function out, I realised that this will only work for data separated by spaces, which is not the case for my data:
In[20]: parse_time('10h 50min')
['1h']
Out[21]: 0
In[22]: parse_time('10 h 50 min')
['10h', '50min']
Out[23]:0
In[24]: parse_time('10 h 50 min')
['10', 'h', '50', 'min']
Out[24]: 650
Can anyone advise me what to change in my code so that this works, or offer an alternative, simpler solution?
Thanks :)