1

I have a string

s = 'count_EVENT_GENRE in [1,2,3,4,5]'
#I have to capture only the field 'count_EVENT_GENRE'
field = re.split(r'[(==)(>=)(<=)(in)(like)]', s)[0].strip()
#o/p is  'cou'
# for s = 'sum_EVENT_GENRE in [1,2,3,4,5]'  o/p = 'sum_EVENT_GENRE' 

which is fine

My doubt is for any character in (in)(like) it is splitting the string s at that character and giving me first slice.(as after "cou" it finds one matching char i:e n). It's happening for any string that contains any character from (in)(like).

Ex : 'percentage_AMOUNT' o/p = 'p'

as it finds a matching char as 'e' after p.

So i want some advice how to treat (in)(like) as words not as characters , when splitting occurs/matters.

please suggest a syntax.

5
  • 2
    Perhaps, you need r'[=><]=|in|like' Commented Jul 20, 2016 at 7:02
  • What is the desired output for this input? Why? Commented Jul 20, 2016 at 7:03
  • 1
    The [ ] matches only one character in that wordlist, so it means "either ( or = or ) or > or i or n or l or k or e". You probably mean (==|>=|<=|in|like) Commented Jul 20, 2016 at 7:03
  • perhaps use re.findall(r'^\w+', s)[0] instead? Commented Jul 20, 2016 at 7:05
  • Use debuggex.com for regex edits. It giving visual representation. Commented Jul 20, 2016 at 7:10

3 Answers 3

1

Answering your question, the [(==)(>=)(<=)(in)(like)] is a character class matching single characters you defined inside the class. To match sequences of characters, you need to remove [ and ] and use alternation:

r'==?|>=?|<=?|\b(?:in|like)\b'

or better:

r'[=><]=?|\b(?:in|like)\b'

You code would look like:

import re
ss = ['count_EVENT_GENRE in [1,2,3,4,5]','coint_EVENT_GENRE = "ROMANCE"']
for s in ss:
    field = re.split(r'[=><]=?|\b(?:in|like)\b', s)[0].strip()
    print(field)

However, there might be other (easier, or safer - depending on the actual specifications) ways to get what you want (splitting with space and getting the first item, use re.match with r'\w+' or r'[a-z]+(?:_[A-Z]+)+', etc.)

If your value is at the start of the string and starts with lowercase ASCII letters, and then can have any amount of sequences of _ followed with uppercase ASCII letters, use:

re.match(r'[a-z]+(?:_[A-Z]+)*', s)

Full demo code:

import re
ss = ['count_EVENT_GENRE in [1,2,3,4,5]','coint_EVENT_GENRE = "ROMANCE"']
for s in ss:
    fieldObj = re.match(r'[a-z]+(?:_[A-Z]+)*', s)
    if fieldObj:
        print(fieldObj.group())
Sign up to request clarification or add additional context in comments.

2 Comments

in case s = ' coint_EVENT_GENRE = "ROMANCE"' . look at coint it has 'in'. so when i apply r'[=><]=|in|like' it gives me 'co' by splitting at 'in' by co+in+t. So r'\w+' will be better.. But how to avoid that from r'[=><]=|in|like' by adding any \w+ \w in that syntax.
Use word boundaries: r'[=><]=?|\b(?:in|like)\b' and I see that you have a single = - then you need to make the = optional by appending ? after it. Why not use re.match(r'[a-z]+(?:_[A-Z]+)*', s)?
1

If you want only the first word of your string, then this should do the job:

import re
s = 'count_EVENT_GENRE in [1,2,3,4,5]'
field = re.split(r'\W', s)[0]
# count_EVENT_GENRE

Comments

1

Is there anything wrong with using split?

>>> s = 'count_EVENT_GENRE in [1,2,3,4,5]'
>>> s.split(' ')[0]
'count_EVENT_GENRE'
>>> s = 'coint_EVENT_GENRE = "ROMANCE"'
>>> s.split(' ')[0]
'coint_EVENT_GENRE'
>>>

4 Comments

I actually mentioned this in my answer (see splitting with space and getting the first item). However, split does not make sure the substring meets a specific pattern and if you supply arbitrary string, its first non-whitespace chunk will always be returned.
Nicely done and most simple. Thanks. But I have to apply some more parsing on the o/p and have more categories for which @Wiktor answer gave me a future insight. So I have to accept Wiktor's answer. I upvoted your solution. Thanks Again.
@WiktorStribiżew Sorry Wiktor, I didn't see your reference to split
It is perfectly ok, and no need to feel sorry, just when the idea is so basic, I tend to just post a comment. In your shoes (I mean if I had the same rep amount), I'd probably also post this answer :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.