0

I am trying to create a new column in a data Frame by applying a function on a column that has numbers as strings.

I have written the function to extract the numbers I want and tested it on a single string input and can confirm that it works.

SEARCH_PATTERN = r'([0-9]{1,2}) ([0-9]{2}):([0-9]{2}):([0-9]{2})'
def get_total_time_minutes(time_col, pattern=SEARCH_PATTERN):
    """Uses regex to parse time_col which is a string in the format 'd hh:mm:ss' to
    obtain a total time in minutes
    """
    days, hours, minutes, _ = re.match(pattern, time_col).groups()
    total_time_minutes = (int(days)*24 + int(hours))*60 + int(minutes)
    return total_time_minutes

#test that the function works for a single input
text = "2 23:24:46"
print(get_total_time_minutes(text))

Ouput: 4284

#apply the function to the required columns
df['Minutes Available'] = df['Resource available (d hh:mm:ss)'].apply(get_total_time_minutes)

The picture below is a screenshot of my dataframe columns. Screenshot of my dataframe The 'Resources available (d hh:mm:ss)' column of my dataframe is of Pandas type 'o' (string, if my understanding is correct), and has data in the following format: '5 08:00:00'. When I call the apply(get_total_time_minutes) on it though, I get the following error:

TypeError: expected string or bytes-like object

To clarify further, the "Resources Available" column is a string representing the total time in days, hours, minutes and seconds that the resource was available. I want to convert that time string to a total time in minutes, hence the regex and arithmetic within the get_total_time_minutes function. – Sam Ezebunandu just now

6
  • 1
    Can you please add an example row of your dataframe df. Commented Aug 9, 2019 at 14:55
  • try .applymap() instead of .apply() because get_total_time_minutes() is designed to operate on each cell of your column; not the column itself as a vector. Commented Aug 9, 2019 at 15:09
  • It seems to be working for me: ``` >>> d = pd.DataFrame({"Resource available (d hh:mm:ss)": ["2 23:24:46","3 23:12:45"]}) >>> d['Minutes Available'] = d['Resource available (d hh:mm:ss)'].apply(get_total_time_minutes) >>> d Resource available (d hh:mm:ss) Minutes Available 0 2 23:24:46 4284 1 3 23:12:45 5712 ``` Commented Aug 9, 2019 at 15:11
  • Thanks @AlexandreB. I have added a screenshot of my dataframe. Commented Aug 9, 2019 at 15:51
  • Hi @JeremyHue. I have added a screenshot of my dataframe. Commented Aug 9, 2019 at 16:24

1 Answer 1

1

This might be a bit hacky, because it uses the datetime library to parse the date and then turn it into a Timedelta by subtracting the default epoch:

>>> pd.to_datetime('2 23:48:30', format='%d %H:%M:%S') - pd.to_datetime('0', format='%S')
Out[47]: Timedelta('1 days 23:48:30')

>>> Out[47] / pd.Timedelta('1 minute')
Out[50]: 2868.5

But it does tell you how many minutes elapsed in those two days and however many hours. It's also vectorised, so you can apply it to the columns and get your minute values a lot faster than using apply.

Sign up to request clarification or add additional context in comments.

6 Comments

TIL the Out object.
If you're dealing with big data, it can also be a real memory drain
The column is actually a time-delta in days hours:minutes:seconds and not a time stamp.
If your column is already pd.Timedelta, then just take the column and divide it by pd.Timedelta('1 minute').
Thanks, @ifly6 This works! I can get rid of the convoluted regex and keep things simple.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.