0

I have a following dataframe

column 1   Description                          Extracted Data
date       January 15,2020 is important day

I want to get following result

column 1   Description                          Extracted Data
date       January 15,2020 is important day     January 15,2020

df.loc[df['column 1']=='date','Extracted Data']=df['Description'].str.extract(r'((January)|[/. ])|(\d{1,2}|[/., ]|\d{4})')

but I ma not getting desired result.Instead, i ma getting dataframe with all NaN values. How can I fix this?

1
  • check out this Commented Feb 11, 2020 at 7:09

3 Answers 3

1

Use multi dot .* and digits.

import pandas as pd

df = pd.DataFrame({'column 1': ['date'], 'Description': ['January 15,2020 is important day']})
df['Extracted Data'] = df['Description'].str.extract(r'(.*,\d{4})')

Output:

  column 1                       Description   Extracted Data
0     date  January 15,2020 is important day  January 15,2020
Sign up to request clarification or add additional context in comments.

Comments

1

This works:(Oneliner)

df['Extracted data'] = [re.match('[A-Za-z]+ \d{2},\d{4}',x)[0] for x in df['Description']]

output:

  column1                              Desc   Extracted data
0    date  January 15,2020 is important day  January 15,2020

Regex Link: https://regex101.com/r/ICDJCp/1

Comments

0
import dateutil.parser as dparser
import pandas as pd

df = pd.DataFrame({'column 1': ['date'], 'Description': ['January 15,2020 is important day']})
df['Extracted Data'] = df['Description'].apply(lambda x: dparser.parse(x,fuzzy=True).strftime('%B %d %Y'))
print(df)
  column 1                       Description   Extracted Data
0     date  January 15,2020 is important day  January 15 2020

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.