Python string split on multiple characters

Question

df = pd.DataFrame({'columnA': ['apple:50-100(+)', 'peach:75-125(-)', 'banana:100-150(+)']})

New to regular expressions...if I want to split 'apple:50-100(+)' (and other example strings above) into a DataFrame as below, what's the best way to do that?

Desired output:

Can you provide some more context for this? How many strings? Where are the strings? What format to they follow? — AMC
– AMC, Commented Dec 10, 2019 at 3:09
Many strings in the format, 'apple:50-100(+)' and 'peach:50-100(-)'. They are in a column in a DataFrame. — Cactus Philosopher
– Cactus Philosopher, Commented Dec 10, 2019 at 3:11
Ah, well that's important information! Could you post an example of the column? — AMC
– AMC, Commented Dec 10, 2019 at 3:12
Can you share more about the first part of the string? Is it always just a single word, letters a-z? — AMC
– AMC, Commented Dec 10, 2019 at 3:20
Please don't post images of code/data/Tracebacks. Just copy the text, paste it in your question and format it as code. — wwii
– wwii, Commented Dec 10, 2019 at 3:40

AMC · Accepted Answer · 2019-12-10 03:18:09Z

4

I can update the regex if you provide more details on the format.

import pandas as pd

df = pd.DataFrame({'columnA': ['apple:50-100(+)', 'peach:75-125(-)', 'banana:100-150(+)']})

pattern = r"(.*):(\d+)-(\d+)\(([+-])\)"

new_df = df['columnA'].str.extract(pattern)

df:

             columnA
0    apple:50-100(+)
1    peach:75-125(-)
2  banana:100-150(+)

new_df:

        0    1    2  3
0   apple   50  100  +
1   peach   75  125  -
2  banana  100  150  +

answered Dec 10, 2019 at 3:18

AMC

2,6977 gold badges15 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

bbd108 Over a year ago

This is the correct answer for pandas, TipsyHyena take a look at the other pandas .str accessors here pandas.pydata.org/pandas-docs/stable/reference/…

Cactus Philosopher Over a year ago

Do you mind directing me to the documentation for this notation here r"(.*):(\d+)-(\d+)\(([+-])\)"? Not familiar with regex.

bbd108 Over a year ago

Best resource to get started with regex imo is regexone.com. Others may have better recommendations

AMC Over a year ago

@TipsyHyena I really like Regex101, it's how I wrote the solution for this, here. regular-expressions.info is also nice as a reference/guide.

Iain Shelvington · Accepted Answer · 2019-12-10 03:18:19Z

1

re.split can be used to split on any string that matches a pattern. For the example you have given the following should work

re.split(r'[\:\-\(\)]+', your_string)

It splits the string on all colons, hyphens and parenthesis (":", "-", "(" and ")")

This results in an empty string as the last member of the list, you can either slice this off

re.split(r'[\:\-\(\)]+', your_string)[:-1]

Or filter out empty values

filter(None, re.split(r'[\:\-\(\)]+', your_string))

edited Dec 10, 2019 at 3:18

answered Dec 10, 2019 at 3:00

Iain Shelvington

32.5k3 gold badges36 silver badges55 bronze badges

2 Comments

adhg Over a year ago

assuming the split is by : - +

Cactus Philosopher Over a year ago

Using re.split on the example string yields ['apple', ':', '50', '-', '100', '(', '+', ')', '']. Now how can I transform this list into a DataFrame as in the question? pd.DataFrame(re.split('(\:|\-|\(|\))', 'apple:50-100(+)')) isn't quite right.

accdias · Accepted Answer · 2019-12-10 03:03:45Z

0

Here is an alternative:

Python 3.7.5 (default, Oct 17 2019, 12:16:48) 
[GCC 9.2.1 20190827 (Red Hat 9.2.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> import pandas as pd
>>> split_it = re.compile(r'(\w+):(\d+)[-](\d+)\((.)\)')
>>> df = pd.DataFrame(split_it.findall('apple:50-100(+)'))
>>> df
       0   1    2  3
0  apple  50  100  +
>>>

answered Dec 10, 2019 at 3:03

accdias

5,3523 gold badges24 silver badges33 bronze badges

2 Comments

Cactus Philosopher Over a year ago

Can this function take a dataframe column as input?

accdias Over a year ago

Probably yes but it would be better if you edit your post and show us a real sample of your data though.

Collectives™ on Stack Overflow

Python string split on multiple characters

3 Answers 3

4 Comments

2 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related