1

This question is similar to this one and I originally answered it with this solution but it turns out I misread the question. However, I feel my answer would be useful for a slightly different use case, and so I post it here.


Given a text file:

04/20/2009; 04/20/09; 4/20/09; 4/3/09
Mar-20-2009; Mar 20, 2009; March 20, 2009; Mar. 20, 2009; Mar 20 2009;
20 Mar 2009; 20 March 2009; 20 Mar. 2009; 20 March, 2009
Mar 20th, 2009; Mar 21st, 2009; Mar 22nd, 2009
Feb 2009; Sep 2009; Oct 2010
6/2008; 12/2009
2009; 2010

Containing already extracted dates in varying formats... the task is to read them into a data frame and then sort them, and then display the output in MM/DD/YYYY format.

Expected output:

0     06/01/2008
1     01/01/2009
2     02/01/2009
3     03/20/2009
4     03/20/2009
5     03/20/2009
6     03/20/2009
7     03/20/2009
8     03/20/2009
9     03/20/2009
10    03/20/2009
11    03/20/2009
12    03/20/2009
13    03/21/2009
14    03/22/2009
15    04/03/2009
16    04/20/2009
17    04/20/2009
18    04/20/2009
19    09/01/2009
20    12/01/2009
21    01/01/2010
22    10/01/2010

How can this be done in pandas?

Note: If the day is missing, consider the 1st and if the month is missing consider January.

3
  • I have seen this somewhere. Commented Sep 6, 2017 at 7:42
  • @Bharathshetty yes. check the link. But the OP wanted something different. Commented Sep 6, 2017 at 7:43
  • Not that one. Maybe a coursera assignment I think. Commented Sep 6, 2017 at 7:43

2 Answers 2

2

Simplier should be omit apply and reset_index only once:

In my opinion drop=1 is worse readable like drop=True.

out = pd.to_datetime(df.stack()).sort_values().dt.strftime('%m/%d/%Y').reset_index(drop=True)
print(out)
0     06/01/2008
1     01/01/2009
2     02/01/2009
3     03/20/2009
4     03/20/2009
5     03/20/2009
6     03/20/2009
7     03/20/2009
8     03/20/2009
9     03/20/2009
10    03/20/2009
11    03/20/2009
12    03/20/2009
13    03/21/2009
14    03/22/2009
15    04/03/2009
16    04/20/2009
17    04/20/2009
18    04/20/2009
19    09/01/2009
20    12/01/2009
21    01/01/2010
22    10/01/2010
dtype: object
Sign up to request clarification or add additional context in comments.

2 Comments

Nice... you removed the apply.
I think you want df.apply(pd.to_datetime).stack() - then apply is necessary. Maybe it was origin idea.
2

Reproducible setup (for an easy MCVE):

import pandas as pd
import io

text = '''04/20/2009; 04/20/09; 4/20/09; 4/3/09
Mar-20-2009; Mar 20, 2009; March 20, 2009; Mar. 20, 2009; Mar 20 2009;
20 Mar 2009; 20 March 2009; 20 Mar. 2009; 20 March, 2009
Mar 20th, 2009; Mar 21st, 2009; Mar 22nd, 2009
Feb 2009; Sep 2009; Oct 2010
6/2008; 12/2009
2009; 2010'''

buf = io.stringIO(text)

df = pd.read_csv(buf, engine='python', delimiter=';\s+', header=None).reset_index()

df

            index               0               1               2  \
0      04/20/2009        04/20/09         4/20/09          4/3/09   
1     Mar-20-2009    Mar 20, 2009  March 20, 2009   Mar. 20, 2009   
2     20 Mar 2009   20 March 2009    20 Mar. 2009  20 March, 2009   
3  Mar 20th, 2009  Mar 21st, 2009  Mar 22nd, 2009            None   
4        Feb 2009        Sep 2009        Oct 2010            None   
5          6/2008         12/2009            None            None   
6            2009            2010            None            None   

              3  
0          None  
1  Mar 20 2009;  
2          None  
3          None  
4          None  
5          None  
6          None 

Replace buf with the name of your text file.


You can use df.apply and df.stack, followed by pd.Series.sort_values.

out = df.stack().apply(pd.to_datetime)\
        .reset_index(drop=1)\
        .sort_values().dt.strftime('%m/%d/%Y')\
        .reset_index(drop=1)
print(out)

0     06/01/2008
1     01/01/2009
2     02/01/2009
3     03/20/2009
4     03/20/2009
5     03/20/2009
6     03/20/2009
7     03/20/2009
8     03/20/2009
9     03/20/2009
10    03/20/2009
11    03/20/2009
12    03/20/2009
13    03/21/2009
14    03/22/2009
15    04/03/2009
16    04/20/2009
17    04/20/2009
18    04/20/2009
19    09/01/2009
20    12/01/2009
21    01/01/2010
22    10/01/2010

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.