I am trying to create a new column in a data Frame by applying a function on a column that has numbers as strings.
I have written the function to extract the numbers I want and tested it on a single string input and can confirm that it works.
SEARCH_PATTERN = r'([0-9]{1,2}) ([0-9]{2}):([0-9]{2}):([0-9]{2})'
def get_total_time_minutes(time_col, pattern=SEARCH_PATTERN):
"""Uses regex to parse time_col which is a string in the format 'd hh:mm:ss' to
obtain a total time in minutes
"""
days, hours, minutes, _ = re.match(pattern, time_col).groups()
total_time_minutes = (int(days)*24 + int(hours))*60 + int(minutes)
return total_time_minutes
#test that the function works for a single input
text = "2 23:24:46"
print(get_total_time_minutes(text))
Ouput: 4284
#apply the function to the required columns
df['Minutes Available'] = df['Resource available (d hh:mm:ss)'].apply(get_total_time_minutes)
The picture below is a screenshot of my dataframe columns. Screenshot of my dataframe The 'Resources available (d hh:mm:ss)' column of my dataframe is of Pandas type 'o' (string, if my understanding is correct), and has data in the following format: '5 08:00:00'. When I call the apply(get_total_time_minutes) on it though, I get the following error:
TypeError: expected string or bytes-like object
To clarify further, the "Resources Available" column is a string representing the total time in days, hours, minutes and seconds that the resource was available. I want to convert that time string to a total time in minutes, hence the regex and arithmetic within the get_total_time_minutes function. – Sam Ezebunandu just now
df..applymap()instead of.apply()becauseget_total_time_minutes()is designed to operate on each cell of your column; not the column itself as a vector.