1

need some help/advise how to wrangling dates into a Pandas DataFrame. I have Python list looking like this:

['',
 '20180715:1700-20180716:1600',
 '20180716:1700-20180717:1600',
 '20180717:1700-20180718:1600',
 '20180718:1700-20180719:1600',
 '20180719:1700-20180720:1600',
 '20180721:CLOSED',
 '20180722:1700-20180723:1600',
 '20180723:1700-20180724:1600',
 '20180724:1700-20180725:1600',
 '20180725:1700-20180726:1600',
 '20180726:1700-20180727:1600',
 '20180728:CLOSED']

Is there an easy way to transform this into a Pandas DataFrame with two columns (start time and end time)?

1 Answer 1

3

Sample:

L = ['',
 '20180715:1700-20180716:1600',
 '20180716:1700-20180717:1600',
 '20180717:1700-20180718:1600',
 '20180718:1700-20180719:1600',
 '20180719:1700-20180720:1600',
 '20180721:CLOSED',
 '20180722:1700-20180723:1600',
 '20180723:1700-20180724:1600',
 '20180724:1700-20180725:1600',
 '20180725:1700-20180726:1600',
 '20180726:1700-20180727:1600',
 '20180728:CLOSED']

I think best here is use list comprehension with split by separator and filter out values with no splitter:

df = pd.DataFrame([x.split('-') for x in L if '-' in x], columns=['start','end'])
print (df)
           start            end
0  20180715:1700  20180716:1600
1  20180716:1700  20180717:1600
2  20180717:1700  20180718:1600
3  20180718:1700  20180719:1600
4  20180719:1700  20180720:1600
5  20180722:1700  20180723:1600
6  20180723:1700  20180724:1600
7  20180724:1700  20180725:1600
8  20180725:1700  20180726:1600
9  20180726:1700  20180727:1600

Pandas solution is also possible, especially if need process Series - here is used split and dropna:

s = pd.Series(L)

df = s.str.split('-', expand=True).dropna(subset=[1])
df.columns = ['start','end']
print (df)
            start            end
1   20180715:1700  20180716:1600
2   20180716:1700  20180717:1600
3   20180717:1700  20180718:1600
4   20180718:1700  20180719:1600
5   20180719:1700  20180720:1600
7   20180722:1700  20180723:1600
8   20180723:1700  20180724:1600
9   20180724:1700  20180725:1600
10  20180725:1700  20180726:1600
11  20180726:1700  20180727:1600
Sign up to request clarification or add additional context in comments.

5 Comments

both are working but they are obviously not datetime yet. use apply to transform them?
@steff - Hmm, so list is different? There are some nested values?
i have a function parsing it into the right shape but its quite specific for this case. can simply apply that
@steff - So need convert output columns to datetimes as last step?
@steff - So need df = df.apply(lambda x: pd.to_datetime(x, format='%Y%m%d:%H%M')) ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.