Python Pandas Cleaning columns with multiple dates

Question

I have a dataframe with a column looking like this:

Event date
1/3/2013
11/01/2011-10/01/2012
11/01/2011-10/01/2012
11/01/2011-10/01/2012
10/01/2012 - 02/18/2013
2/12/2013
01/18/2013-01/23/2013
11/01/2012-01/19/2013

Is there a good way to separate the dates into two columns like

df['Start date']
df['end date']

where rows with single dates are start date by default.

Phillip Cloud · Accepted Answer · 2014-06-07 20:02:51Z

You can also use Series.str.extract() here to do this all in one fell swoop:

In [22]: df
Out[22]:
                event_date
0                 1/3/2013
1    11/01/2011-10/01/2012
2    11/01/2011-10/01/2012
3    11/01/2011-10/01/2012
4  10/01/2012 - 02/18/2013
5                2/12/2013
6    01/18/2013-01/23/2013
7    11/01/2012-01/19/2013

In [23]: df.event_date.str.extract(r'(?P<all>(?P<start>\d{1,2}/\d{1,2}/\d{4})\s*-?\s*(?P<end>\d{1,2}/\d{1,2}/\d{4})?)')
Out[23]:
                       all       start         end
0                 1/3/2013    1/3/2013         NaN
1    11/01/2011-10/01/2012  11/01/2011  10/01/2012
2    11/01/2011-10/01/2012  11/01/2011  10/01/2012
3    11/01/2011-10/01/2012  11/01/2011  10/01/2012
4  10/01/2012 - 02/18/2013  10/01/2012  02/18/2013
5                2/12/2013   2/12/2013         NaN
6    01/18/2013-01/23/2013  01/18/2013  01/23/2013
7    11/01/2012-01/19/2013  11/01/2012  01/19/2013

Karl D. · Accepted Answer · 2014-06-07 18:49:06Z

1

You can do something like the following using the vectorized string split:

>>> df

                event_date  x
0                 1/3/2013  1
1    11/01/2011-10/01/2012  1
2    11/01/2011-10/01/2012  1
3    11/01/2011-10/01/2012  1
4  10/01/2012 - 02/18/2013  1
5                2/12/2013  1
6    01/18/2013-01/23/2013  1
7    11/01/2012-01/19/2013  1


>>> df['beg'] = df['event_date'].str.split('\s*-\s*').str[0]
>>> df['end'] = df['event_date'].str.split('\s*-\s*').str[1]
>>> df

                event_date  x         beg         end
0                 1/3/2013  1    1/3/2013         NaN
1    11/01/2011-10/01/2012  1  11/01/2011  10/01/2012
2    11/01/2011-10/01/2012  1  11/01/2011  10/01/2012
3    11/01/2011-10/01/2012  1  11/01/2011  10/01/2012
4  10/01/2012 - 02/18/2013  1  10/01/2012  02/18/2013
5                2/12/2013  1   2/12/2013         NaN
6    01/18/2013-01/23/2013  1  01/18/2013  01/23/2013
7    11/01/2012-01/19/2013  1  11/01/2012  01/19/2013

Edit As @DSM points out you could also do something like the following:

>>> pd.DataFrame(df['event_date'].str.split('\s*-\s*').tolist(),
                  columns=['beg','end'])

          beg         end
0    1/3/2013        None
1  11/01/2011  10/01/2012
2  11/01/2011  10/01/2012
3  11/01/2011  10/01/2012
4  10/01/2012  02/18/2013
5   2/12/2013        None
6  01/18/2013  01/23/2013
7  11/01/2012  01/19/2013

edited Jun 7, 2014 at 18:49

answered Jun 7, 2014 at 16:54

Karl D.

13.8k5 gold badges59 silver badges41 bronze badges

4 Comments

Phillip Cloud Over a year ago

What's the purpose of the x column?

DSM Over a year ago

Annoyingly the cleanest way I can think of to get a bunch of columns out after using str.split is something like pd.DataFrame(df["Event date"].str.split("\s*-\s*").tolist()).

Karl D. Over a year ago

No reason for the 'x' column. it's superfluous.

Karl D. Over a year ago

Yeah, I was going to add that as an alternative @DSM but then I didn't think it was any cleaner for this case. You're right though, if there are lots of resulting columns, that would be cleaner.

Collectives™ on Stack Overflow

Python Pandas Cleaning columns with multiple dates

2 Answers 2

Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related