1

How to split this column into 2 or more columns. I've used str.split('/',2) to split but it just removed the '/' and did not split into 2 columns.

X
East Bound: 6900 / West Bound: 7700
East Bound: 7800 / West Bound: 8700
North Bound: 5000 / South Bound: 4900
North Bound: 7000 / South Bound: 9000
East Bound: 4900 / West Bound: 9700

What I want is:

First Direction Second direction
East Bound: 6900 West Bound: 7700
East Bound: 7800 West Bound: 8700
North Bound: 5000 South Bound: 4900
North Bound: 7000 South Bound: 9000
East Bound: 4900 West Bound: 9700

Even better is if I can have four column headers for the four cardinal directions and filling it with the values from the first table such as:

North South East West
0 0 6900 7700
0 0 7800 8700
5000 4900 0 0
7000 4900 0 0
0 0 4900 9700

If I have read on the documentation correctly, I believe this can be done with regex patterns but is there an efficient way to do this concisely?

Here is the original df for use: df = ['East Bound: 6900 / West Bound: 7700', 'East Bound: 7800 / West Bound: 8700', 'North Bound: 5000 / South Bound: 4900', 'North Bound: 7000 / South Bound: 9000', 'East Bound: 4900 / West Bound: 9700']

4
  • 1
    This is really two questions, as you've provided two different outputs. As to the first; Have you checked the docs for Series.str.split? Specifically the parameter expand (bool), default False: Expand the split strings into separate columns? Commented Jun 29, 2022 at 18:54
  • Also - why does input have 5 rows and output have 4 rows? Commented Jun 29, 2022 at 18:57
  • I have and when i had expand = 'true' code spit out an error. It worked for false. Also the output has 4 rows just because I thought 4 was enough to get the point across for the second possibility but i can add a 5th row for further clarification @JonClements Commented Jun 29, 2022 at 18:59
  • @BeginnerProgrammer oh it's fine... it just looks odd that you have 5 in but only 4 out... it's not clear in that case given the small sample size if you've just managed to leave one out or you've not mentioned some logic in your question that would preclude a reason for some reason. Commented Jun 29, 2022 at 19:37

2 Answers 2

1

For Q1, you can try .str.split

df[['First Direction', 'Second direction']] = df['X'].str.split(' / ', expand=True)
print(df)

                                       X     First Direction    Second direction
0    East Bound: 6900 / West Bound: 7700   East Bound: 6900     West Bound: 7700
1    East Bound: 7800 / West Bound: 8700   East Bound: 7800     West Bound: 8700
2  North Bound: 5000 / South Bound: 4900  North Bound: 5000    South Bound: 4900
3  North Bound: 7000 / South Bound: 9000  North Bound: 7000    South Bound: 9000
4    East Bound: 4900 / West Bound: 9700   East Bound: 4900     West Bound: 9700

For Q2, you can try to convert X column to dictionary then explode the column into separate columns

out = df['X'].apply(lambda x: dict([direction.split(':') for direction in x.split(' / ')])).apply(pd.Series)
print(out)

  East Bound West Bound North Bound South Bound
0       6900       7700         NaN         NaN
1       7800       8700         NaN         NaN
2        NaN        NaN        5000        4900
3        NaN        NaN        7000        9000
4       4900       9700         NaN         NaN
Sign up to request clarification or add additional context in comments.

11 Comments

Didn't know there was an expand=True option for string functions, this is better than my version, though I would change '/' to ' / ', or add .applymap(str.strip) to the end~
Thank you! @Ynjxsjmh I guess, I really wanted my second question answered but when I apply your code to my df, i get a error 'dictionary update sequence element #1 has length 1; 2 is required'. Could you explain your line of code for me? Just from initial observation, we are applying a dictionary split across the column but that's where my understanding ends
@BeginnerProgrammer Updated the answer. Idea is that dict([('a', 1), ('b', 2)]) returns a dictionary.
@BeginnerProgrammer does df['X'].str.extractall(r'(?P<direction>North|South|West|East) (?:Bound): (?P<n>\d+)').reset_index().rename(columns={'level_0': 'ID'}).pivot('ID', 'direction', 'n') also get you close?
Not sure which is more efficient, but the second version can also be done like: df.apply(lambda x: dict([direction.split(':') for direction in x.X.split(' / ')]), axis=1, result_type='expand')
|
1

My approach would be to use Series.str.extractall with a specific pattern to get the direction and the amount, convert the amount to a suitable type (I've just gone for integer here), then pivot_table filling in with zeros where appropriate, eg:

out = (
    df['X'].str.extractall(r'(?P<bound>North|South|West|East) (?:Bound): (?P<n>\d+)')
    .astype({'n': int})
    .pivot_table(index=pd.Grouper(level=0), columns='bound', values='n', fill_value=0)
)

This'll give you:

bound  East  North  South  West
0      6900      0      0  7700
1      7800      0      0  8700
2         0   5000   4900     0
3         0   7000   9000     0
4      4900      0      0  9700

This retains your original DF ID's... so you can merge/join back to your original DF at some point.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.