2

I have a column in a Pandas Dataframe, called 'Excel_Date'. This column data looks like this:

Excel_Date
Before Q1 2018
Before Q1 2014
Before Q4 2018
42457
42457
42520
nan
nan

The column's dtype('O').

I have no idea how I can get this in a proper way.

Desired Output

Excel_Date
Before Q1 2018 #Or even better: the first month and day of Q1 (1/1/2018)
Before Q1 2014 #Or even better: the first month and day of Q1 (1/1/2014)
Before Q4 2018 #Or even better: the first month and day of Q4 (10/1/2018)
3/28/2016
3/28/2016
5/30/2017
nan
nan

The '#Or even better:... ' in the example would amazing! But I can understand that that could be a bit difficult.

What have I tried?

I tried to divide the problem, into smaller sub problems:

 1. Create a column, with only the numeric values
 > df['Excel_Date2'] = df['Excel_Date'].str.extract("(\d*\.?\d+)", expand=True

 2. After that, I tried to deal with the numbers. But I failed.    
 >import datetime as dt
 >import pandas as pd
 >pd.TimedeltaIndex(df['Excel_Date2'], unit='d') + dt.datetime(1899, 12, 30)

Many, many thanks in advance!

1 Answer 1

1

First get only numeric values and then use your solution:

s = pd.to_numeric(df['Excel_Date'], errors='coerce')

df['new'] = pd.to_timedelta(s,unit='d').add(pd.datetime(1899,12,30)).fillna(df['Excel_Date'])
print (df)
       Excel_Date                  new
0  Before Q1 2018       Before Q1 2018
1  Before Q1 2014       Before Q1 2014
2  Before Q4 2018       Before Q4 2018
3           42457  2016-03-28 00:00:00
4           42457  2016-03-28 00:00:00
5           42520  2016-05-30 00:00:00
6             NaN                  NaN
7             NaN                  NaN

And better is export quartals, convert to datetimes and last repalce missing values by datetimes from quartals:

df1 = df['Excel_Date'].str.extract("(Q[1-4])\s+([1-2]\d{3})", expand=True)
s1 = pd.to_datetime(df1[1] + df1[0])
s2 = pd.to_numeric(df['Excel_Date'], errors='coerce')

df['new'] = pd.to_timedelta(s2, unit='d').add(pd.datetime(1899, 12, 30)).fillna(s1)
print (df)
       Excel_Date        new
0  Before Q1 2018 2018-01-01
1  Before Q1 2014 2014-01-01
2  Before Q4 2018 2018-10-01
3           42457 2016-03-28
4           42457 2016-03-28
5           42520 2016-05-30
6             NaN        NaT
7             NaN        NaT
Sign up to request clarification or add additional context in comments.

1 Comment

This is exactly what I was looking for. Both solutions are working perfectly. Thanks @jezrael!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.