0

I have a dataframe with a column looking like this:

Event date
1/3/2013
11/01/2011-10/01/2012
11/01/2011-10/01/2012
11/01/2011-10/01/2012
10/01/2012 - 02/18/2013
2/12/2013
01/18/2013-01/23/2013
11/01/2012-01/19/2013

Is there a good way to separate the dates into two columns like

df['Start date']
df['end date']

where rows with single dates are start date by default.

2 Answers 2

2

You can also use Series.str.extract() here to do this all in one fell swoop:

In [22]: df
Out[22]:
                event_date
0                 1/3/2013
1    11/01/2011-10/01/2012
2    11/01/2011-10/01/2012
3    11/01/2011-10/01/2012
4  10/01/2012 - 02/18/2013
5                2/12/2013
6    01/18/2013-01/23/2013
7    11/01/2012-01/19/2013

In [23]: df.event_date.str.extract(r'(?P<all>(?P<start>\d{1,2}/\d{1,2}/\d{4})\s*-?\s*(?P<end>\d{1,2}/\d{1,2}/\d{4})?)')
Out[23]:
                       all       start         end
0                 1/3/2013    1/3/2013         NaN
1    11/01/2011-10/01/2012  11/01/2011  10/01/2012
2    11/01/2011-10/01/2012  11/01/2011  10/01/2012
3    11/01/2011-10/01/2012  11/01/2011  10/01/2012
4  10/01/2012 - 02/18/2013  10/01/2012  02/18/2013
5                2/12/2013   2/12/2013         NaN
6    01/18/2013-01/23/2013  01/18/2013  01/23/2013
7    11/01/2012-01/19/2013  11/01/2012  01/19/2013
Sign up to request clarification or add additional context in comments.

Comments

1

You can do something like the following using the vectorized string split:

>>> df

                event_date  x
0                 1/3/2013  1
1    11/01/2011-10/01/2012  1
2    11/01/2011-10/01/2012  1
3    11/01/2011-10/01/2012  1
4  10/01/2012 - 02/18/2013  1
5                2/12/2013  1
6    01/18/2013-01/23/2013  1
7    11/01/2012-01/19/2013  1


>>> df['beg'] = df['event_date'].str.split('\s*-\s*').str[0]
>>> df['end'] = df['event_date'].str.split('\s*-\s*').str[1]
>>> df

                event_date  x         beg         end
0                 1/3/2013  1    1/3/2013         NaN
1    11/01/2011-10/01/2012  1  11/01/2011  10/01/2012
2    11/01/2011-10/01/2012  1  11/01/2011  10/01/2012
3    11/01/2011-10/01/2012  1  11/01/2011  10/01/2012
4  10/01/2012 - 02/18/2013  1  10/01/2012  02/18/2013
5                2/12/2013  1   2/12/2013         NaN
6    01/18/2013-01/23/2013  1  01/18/2013  01/23/2013
7    11/01/2012-01/19/2013  1  11/01/2012  01/19/2013

Edit As @DSM points out you could also do something like the following:

>>> pd.DataFrame(df['event_date'].str.split('\s*-\s*').tolist(),
                  columns=['beg','end'])

          beg         end
0    1/3/2013        None
1  11/01/2011  10/01/2012
2  11/01/2011  10/01/2012
3  11/01/2011  10/01/2012
4  10/01/2012  02/18/2013
5   2/12/2013        None
6  01/18/2013  01/23/2013
7  11/01/2012  01/19/2013

4 Comments

What's the purpose of the x column?
Annoyingly the cleanest way I can think of to get a bunch of columns out after using str.split is something like pd.DataFrame(df["Event date"].str.split("\s*-\s*").tolist()).
No reason for the 'x' column. it's superfluous.
Yeah, I was going to add that as an alternative @DSM but then I didn't think it was any cleaner for this case. You're right though, if there are lots of resulting columns, that would be cleaner.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.