4

Given a string:

str = "apple AND orange OR banana"

I want to split it by "AND" or "OR". The expected result is

['apple', 'orange', 'banana']

Is any simple way for python to do it?

Thanks!

1
  • 4
    please don't use str as a variable name. str is a well-known builtin and you are asking for all kinds of trouble overriding it. Commented Mar 18, 2015 at 19:23

4 Answers 4

7

You can use regex to split based on any combinations of uppercase letters with len 1 or more :

>>> tr = "apple AND orange OR banana"
>>> re.split(r'[A-Z]+',tr)
['apple ', ' orange ', ' banana']

But if you just want to split with AND or OR :

>>> re.split(r'AND|OR',tr)
['apple ', ' orange ', ' banana']

And for remove the spaces if you are sure that your sentence are contain distinc words you can do :

>>> re.split(r'[A-Z ]+',tr)
['apple', 'orange', 'banana']

If you have a AND or OR in leading or trailing of your string using split will create a empty string in result , for get ride of that you can loop over splited list and check for validation of items, but as a more elegant way you can use re.findall : with r'[^A-Z ]+' as its pattern :

>>> tr = "AND apple AND orangeOR banana"
>>> re.split(r'\s?(?:AND|OR)\s?',tr)
['', 'apple', 'orange', 'banana']
>>> re.split(r'[A-Z ]+',tr)
['', 'apple', 'orange', 'banana']
>>> [i for i in re.split(r'[A-Z ]+',tr) if i]
['apple', 'orange', 'banana']
>>> re.findall(r'[^A-Z ]+',tr)
['apple', 'orange', 'banana']
Sign up to request clarification or add additional context in comments.

3 Comments

Use '\s+(AND|OR)\s+' and the whitespace is gone, too ;-)
Your last code will fail if there is an OR or AND at the beginning or end of the string
@PadraicCunningham yea! i miss ? but i fixed the answer! better!
3

I can think of two ways to accomplish this:

In [230]: s = "apple AND orange OR banana"

In [231]: delims = ["AND", "OR"]

In [232]: for d in delims:
   .....:     s = s.replace(d, '-')
   .....:     

In [233]: s.split('-')
Out[233]: ['apple ', ' orange ', ' banana']

OR

In [234]: s = "apple AND orange OR banana"

In [235]: delims = ["AND", "OR"]

In [236]: for d in delims:
   .....:     s = s.replace(d, ' ')
   .....:     

In [237]: s.split()
Out[237]: ['apple', 'orange', 'banana']

2 Comments

Besides using two for loops this is the cleanest way I could think of.
You'd need to find a delimiter that's not in hour string already. Which can be challenging.
2

You can split and filter with a set:

s = "apple AND orange OR banana"

print([word for word in s.split() if word not in {"AND","OR"}])

['apple', 'orange', 'banana']

Comments

1

Why not use filter and re.split like this:

my_list = list(filter(None, re.split("\s*(?:AND|OR)\s*", my_str)))

This will work even in the case that AND or OR is at the very beginning of your string. Also, you should be aware that str is a pretty terrible variable name since it is a built-in.

This gives the output:

['apple', 'orange', 'banana']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.