3

Background:

I have a pandas DataFrame containing a tweet and weather column. The DataFrame columns are current as follows -

enter image description here

Objective:
I am trying to extract the datestamp from the weather column (e.g the datestamp for row index 0 is '(2020-07-14)') and save it in a new date column, with the purpose of filtering on it, e.g filtering to the latest date.

I know how to change a column string value to a datestamp, if it were something like '20140512'. However I have no idea how to identify a datestamp in the current format and extract that into a new column.

Any advice would be greatly appreciated

2
  • is it always inside of the weather column, in (YYYY-MM-DD) format? Commented Jul 17, 2020 at 20:50
  • Hi Derek - yes, the format remains consistent. Commented Jul 17, 2020 at 20:53

1 Answer 1

1

you could do something like this, assuming it's in the weather column and always has the same formatting:

df['date'] = pd.to_datetime(df['weather'].str.extract('\((\d{4}-\d{2}-\d{2})\)')[0])

or

import re
df['date'] = pd.to_datetime(df['weather'].apply(lambda x: re.search('\((\d{4}-\d{2}-\d{2})\)', x).group(1)))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.