0

I'm having a hard time understanding Regular Expressions in Python.

else:
    #REGEX1
    ret = re.search(r'name:(?P<scname>)',line) 
    if(ret != None):
        print('The  name is'+ret.group("scname"))
    else:
    #REGEX2
    ret = re.search(r'(?P<content>)',line)
    print('The content is'+ret.group("content"))

I'm parsing a text file with the following content

name:english
1001Nights 
A Night at the Call Center
Grammar
name:science
Engineering
Biology
Physics
name:maths
Algebra
Geometry

I want the output to be

The name is english
The content is 1001Nights
The content is A Night at the Call Center
The content is Grammar
The name is science
The content is Engineering
The content is Biology

Please help me correct my regex and suggest any link to understand regular expressions more easily. The official documentation feels a bit overwhelming since I'm new to Python

UPDATE

This is the error I get , if it helps

The subclient name is
Traceback (most recent call last):
  File "create&&bkp.py", line 32, in <module>
    print('The subclient name is'+ret.group("scname"))
IndexError: no such group
0

4 Answers 4

2
ret = re.search(r'name:(?P<scname>)',line) 

This searches for 'name:' somewhere in the line (not necessarily at the beginning), and if found, produces a match object with a group at the position after the colon. Since there's nothing between the > and ), this group is empty, but it does have the name scname. Thus the code snippet you've shown doesn't match the error. Other mismatches include the printing of part of the string before the error and the word "subclient".

I would consider simple string processing:

for line in lines:
    line=line.rstrip('\n')    # assuming it came from a file, remove newline
    if line.startswith('name:'):
        print('The name is '+line[len('name:'):])
    else:
        print('The content is '+line)

It's also possible to do the entire classification using the regex:

matcher=re.compile(r'^(name:(?P<name>.*)|(?P<content>.*))$')
for line in lines:
    m=matcher.match(line)
    for key,value in m.groupdict():
        if value is not None:
            print('The {} is {}'.format(key,value))
Sign up to request clarification or add additional context in comments.

Comments

1

You don't need a regex if your file is in the format posted:

with open("in.txt") as f:
    for line in f:
        if "name:" in line:
            print("The name is {}".format(line.rstrip().split("name:",1)[1]))
        else:
            print("The content is {}".format(line.rstrip()))

Output:

The name is english
The content is 1001Nights
The content is A Night at the Call Center
The content is Grammar
The name is science
The content is Engineering
The content is Biology
The content is Physics
The name is maths
The content is Algebra
The content is Geometry

2 Comments

Thanks @Padraic but having a regex based search is more robust right ? What if the content had ":" in its name ? :)
it would make no difference, we are only splitting when the line has the string name: in it
0
(?<=:)(.*)$

This would be your regex1.See demo.

http://regex101.com/r/iZ9sO5/8

^(?!.*?:)(.*)$

This would be your regex2.See demo.

http://regex101.com/r/iZ9sO5/9

2 Comments

Thanks. will check it out and let you know :) :)
File "create&&bkp.py", line 27 ret = re.search((?<=:)(.*)$,line) ^ SyntaxError: invalid syntax
0
else:
    #REGEX1
    ret = re.search(r'name:(.*)$',line) 
    if(ret != None):
        print('The  name is'+ret.group(1))
    else:
        #REGEX2
        # ret = re.search(r'(?P<content>)',line)
        print('The content is'+line))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.