Splitting a string using re module of python

Question

I have a string

s = 'count_EVENT_GENRE in [1,2,3,4,5]'
#I have to capture only the field 'count_EVENT_GENRE'
field = re.split(r'[(==)(>=)(<=)(in)(like)]', s)[0].strip()
#o/p is  'cou'
# for s = 'sum_EVENT_GENRE in [1,2,3,4,5]'  o/p = 'sum_EVENT_GENRE'

which is fine

My doubt is for any character in (in)(like) it is splitting the string s at that character and giving me first slice.(as after "cou" it finds one matching char i:e n). It's happening for any string that contains any character from (in)(like).

Ex : 'percentage_AMOUNT' o/p = 'p'

as it finds a matching char as 'e' after p.

So i want some advice how to treat (in)(like) as words not as characters , when splitting occurs/matters.

please suggest a syntax.

The [ ] matches only one character in that wordlist, so it means "either ( or = or ) or > or i or n or l or k or e". You probably mean (==|>=|<=|in|like) — cdarke
– cdarke, Commented Jul 20, 2016 at 7:03
Use debuggex.com for regex edits. It giving visual representation. — Rahul K P
– Rahul K P, Commented Jul 20, 2016 at 7:10

Wiktor Stribiżew · Accepted Answer · 2016-07-20 07:33:23Z

1

Answering your question, the [(==)(>=)(<=)(in)(like)] is a character class matching single characters you defined inside the class. To match sequences of characters, you need to remove [ and ] and use alternation:

r'==?|>=?|<=?|\b(?:in|like)\b'

or better:

r'[=><]=?|\b(?:in|like)\b'

You code would look like:

import re
ss = ['count_EVENT_GENRE in [1,2,3,4,5]','coint_EVENT_GENRE = "ROMANCE"']
for s in ss:
    field = re.split(r'[=><]=?|\b(?:in|like)\b', s)[0].strip()
    print(field)

However, there might be other (easier, or safer - depending on the actual specifications) ways to get what you want (splitting with space and getting the first item, use re.match with r'\w+' or r'[a-z]+(?:_[A-Z]+)+', etc.)

If your value is at the start of the string and starts with lowercase ASCII letters, and then can have any amount of sequences of _ followed with uppercase ASCII letters, use:

re.match(r'[a-z]+(?:_[A-Z]+)*', s)

Full demo code:

import re
ss = ['count_EVENT_GENRE in [1,2,3,4,5]','coint_EVENT_GENRE = "ROMANCE"']
for s in ss:
    fieldObj = re.match(r'[a-z]+(?:_[A-Z]+)*', s)
    if fieldObj:
        print(fieldObj.group())

edited Jul 20, 2016 at 7:33

answered Jul 20, 2016 at 7:08

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Satya Over a year ago

in case s = ' coint_EVENT_GENRE = "ROMANCE"' . look at coint it has 'in'. so when i apply r'[=><]=|in|like' it gives me 'co' by splitting at 'in' by co+in+t. So r'\w+' will be better.. But how to avoid that from r'[=><]=|in|like' by adding any \w+ \w in that syntax.

Wiktor Stribiżew Over a year ago

Use word boundaries: r'[=><]=?|\b(?:in|like)\b' and I see that you have a single = - then you need to make the = optional by appending ? after it. Why not use re.match(r'[a-z]+(?:_[A-Z]+)*', s)?

Frodon · Accepted Answer · 2016-07-20 07:08:47Z

1

If you want only the first word of your string, then this should do the job:

import re
s = 'count_EVENT_GENRE in [1,2,3,4,5]'
field = re.split(r'\W', s)[0]
# count_EVENT_GENRE

answered Jul 20, 2016 at 7:08

Frodon

3,7951 gold badge18 silver badges35 bronze badges

Comments

Rolf of Saxony · Accepted Answer · 2016-07-20 07:28:57Z

1

Is there anything wrong with using split?

>>> s = 'count_EVENT_GENRE in [1,2,3,4,5]'
>>> s.split(' ')[0]
'count_EVENT_GENRE'
>>> s = 'coint_EVENT_GENRE = "ROMANCE"'
>>> s.split(' ')[0]
'coint_EVENT_GENRE'
>>>

answered Jul 20, 2016 at 7:28

Rolf of Saxony

22.6k5 gold badges43 silver badges61 bronze badges

4 Comments

Wiktor Stribiżew Over a year ago

I actually mentioned this in my answer (see splitting with space and getting the first item). However, split does not make sure the substring meets a specific pattern and if you supply arbitrary string, its first non-whitespace chunk will always be returned.

Satya Over a year ago

Nicely done and most simple. Thanks. But I have to apply some more parsing on the o/p and have more categories for which @Wiktor answer gave me a future insight. So I have to accept Wiktor's answer. I upvoted your solution. Thanks Again.

Rolf of Saxony Over a year ago

@WiktorStribiżew Sorry Wiktor, I didn't see your reference to split

Wiktor Stribiżew Over a year ago

It is perfectly ok, and no need to feel sorry, just when the idea is so basic, I tend to just post a comment. In your shoes (I mean if I had the same rep amount), I'd probably also post this answer :)

Collectives™ on Stack Overflow

Splitting a string using re module of python

3 Answers 3

2 Comments

Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related