How to split a column into two or multiple columns columns in python using either str.split or regex?

Question

How to split this column into 2 or more columns. I've used str.split('/',2) to split but it just removed the '/' and did not split into 2 columns.

X
East Bound: 6900 / West Bound: 7700
East Bound: 7800 / West Bound: 8700
North Bound: 5000 / South Bound: 4900
North Bound: 7000 / South Bound: 9000
East Bound: 4900 / West Bound: 9700

What I want is:

First Direction	Second direction
East Bound: 6900	West Bound: 7700
East Bound: 7800	West Bound: 8700
North Bound: 5000	South Bound: 4900
North Bound: 7000	South Bound: 9000
East Bound: 4900	West Bound: 9700

Even better is if I can have four column headers for the four cardinal directions and filling it with the values from the first table such as:

North	South	East	West
0	0	6900	7700
0	0	7800	8700
5000	4900	0	0
7000	4900	0	0
0	0	4900	9700

If I have read on the documentation correctly, I believe this can be done with regex patterns but is there an efficient way to do this concisely?

Here is the original df for use: df = ['East Bound: 6900 / West Bound: 7700', 'East Bound: 7800 / West Bound: 8700', 'North Bound: 5000 / South Bound: 4900', 'North Bound: 7000 / South Bound: 9000', 'East Bound: 4900 / West Bound: 9700']

This is really two questions, as you've provided two different outputs. As to the first; Have you checked the docs for Series.str.split? Specifically the parameter expand (bool), default False: Expand the split strings into separate columns? — G. Anderson
– G. Anderson, Commented Jun 29, 2022 at 18:54
I have and when i had expand = 'true' code spit out an error. It worked for false. Also the output has 4 rows just because I thought 4 was enough to get the point across for the second possibility but i can add a 5th row for further clarification @JonClements — BeginnerProgrammer
– BeginnerProgrammer, Commented Jun 29, 2022 at 18:59
@BeginnerProgrammer oh it's fine... it just looks odd that you have 5 in but only 4 out... it's not clear in that case given the small sample size if you've just managed to leave one out or you've not mentioned some logic in your question that would preclude a reason for some reason. — Jon Clements
– Jon Clements, Commented Jun 29, 2022 at 19:37

Ynjxsjmh · Accepted Answer · 2022-06-29 19:10:42Z

1

For Q1, you can try .str.split

df[['First Direction', 'Second direction']] = df['X'].str.split(' / ', expand=True)

print(df)

                                       X     First Direction    Second direction
0    East Bound: 6900 / West Bound: 7700   East Bound: 6900     West Bound: 7700
1    East Bound: 7800 / West Bound: 8700   East Bound: 7800     West Bound: 8700
2  North Bound: 5000 / South Bound: 4900  North Bound: 5000    South Bound: 4900
3  North Bound: 7000 / South Bound: 9000  North Bound: 7000    South Bound: 9000
4    East Bound: 4900 / West Bound: 9700   East Bound: 4900     West Bound: 9700

For Q2, you can try to convert X column to dictionary then explode the column into separate columns

out = df['X'].apply(lambda x: dict([direction.split(':') for direction in x.split(' / ')])).apply(pd.Series)

print(out)

  East Bound West Bound North Bound South Bound
0       6900       7700         NaN         NaN
1       7800       8700         NaN         NaN
2        NaN        NaN        5000        4900
3        NaN        NaN        7000        9000
4       4900       9700         NaN         NaN

edited Jun 29, 2022 at 19:10

answered Jun 29, 2022 at 19:00

Ynjxsjmh

30.3k7 gold badges43 silver badges64 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

BeRT2me Over a year ago

Didn't know there was an expand=True option for string functions, this is better than my version, though I would change '/' to ' / ', or add .applymap(str.strip) to the end~

BeginnerProgrammer Over a year ago

Thank you! @Ynjxsjmh I guess, I really wanted my second question answered but when I apply your code to my df, i get a error 'dictionary update sequence element #1 has length 1; 2 is required'. Could you explain your line of code for me? Just from initial observation, we are applying a dictionary split across the column but that's where my understanding ends

Ynjxsjmh Over a year ago

@BeginnerProgrammer Updated the answer. Idea is that dict([('a', 1), ('b', 2)]) returns a dictionary.

Jon Clements Over a year ago

@BeginnerProgrammer does

df['X'].str.extractall(r'(?P<direction>North|South|West|East) (?:Bound): (?P<n>\d+)').reset_index().rename(columns={'level_0': 'ID'}).pivot('ID', 'direction', 'n')

also get you close?

BeRT2me Over a year ago

Not sure which is more efficient, but the second version can also be done like: df.apply(lambda x: dict([direction.split(':') for direction in x.X.split(' / ')]), axis=1, result_type='expand')

|

Jon Clements · Accepted Answer · 2022-06-29 19:41:43Z

My approach would be to use Series.str.extractall with a specific pattern to get the direction and the amount, convert the amount to a suitable type (I've just gone for integer here), then pivot_table filling in with zeros where appropriate, eg:

out = (
    df['X'].str.extractall(r'(?P<bound>North|South|West|East) (?:Bound): (?P<n>\d+)')
    .astype({'n': int})
    .pivot_table(index=pd.Grouper(level=0), columns='bound', values='n', fill_value=0)
)

This'll give you:

bound  East  North  South  West
0      6900      0      0  7700
1      7800      0      0  8700
2         0   5000   4900     0
3         0   7000   9000     0
4      4900      0      0  9700

This retains your original DF ID's... so you can merge/join back to your original DF at some point.

Collectives™ on Stack Overflow

How to split a column into two or multiple columns columns in python using either str.split or regex?

2 Answers 2

11 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

11 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related