Searching with Regex in Python

Question

I'm having a hard time understanding Regular Expressions in Python.

else:
    #REGEX1
    ret = re.search(r'name:(?P<scname>)',line) 
    if(ret != None):
        print('The  name is'+ret.group("scname"))
    else:
    #REGEX2
    ret = re.search(r'(?P<content>)',line)
    print('The content is'+ret.group("content"))

I'm parsing a text file with the following content

name:english
1001Nights 
A Night at the Call Center
Grammar
name:science
Engineering
Biology
Physics
name:maths
Algebra
Geometry

I want the output to be

The name is english
The content is 1001Nights
The content is A Night at the Call Center
The content is Grammar
The name is science
The content is Engineering
The content is Biology

Please help me correct my regex and suggest any link to understand regular expressions more easily. The official documentation feels a bit overwhelming since I'm new to Python

UPDATE

This is the error I get , if it helps

The subclient name is
Traceback (most recent call last):
  File "create&&bkp.py", line 32, in <module>
    print('The subclient name is'+ret.group("scname"))
IndexError: no such group

Yann Vernier · Accepted Answer · 2014-11-05 12:00:01Z

ret = re.search(r'name:(?P<scname>)',line)

This searches for 'name:' somewhere in the line (not necessarily at the beginning), and if found, produces a match object with a group at the position after the colon. Since there's nothing between the > and ), this group is empty, but it does have the name scname. Thus the code snippet you've shown doesn't match the error. Other mismatches include the printing of part of the string before the error and the word "subclient".

I would consider simple string processing:

for line in lines:
    line=line.rstrip('\n')    # assuming it came from a file, remove newline
    if line.startswith('name:'):
        print('The name is '+line[len('name:'):])
    else:
        print('The content is '+line)

It's also possible to do the entire classification using the regex:

matcher=re.compile(r'^(name:(?P<name>.*)|(?P<content>.*))$')
for line in lines:
    m=matcher.match(line)
    for key,value in m.groupdict():
        if value is not None:
            print('The {} is {}'.format(key,value))

Padraic Cunningham · Accepted Answer · 2014-11-05 12:17:37Z

1

You don't need a regex if your file is in the format posted:

with open("in.txt") as f:
    for line in f:
        if "name:" in line:
            print("The name is {}".format(line.rstrip().split("name:",1)[1]))
        else:
            print("The content is {}".format(line.rstrip()))

Output:

The name is english
The content is 1001Nights
The content is A Night at the Call Center
The content is Grammar
The name is science
The content is Engineering
The content is Biology
The content is Physics
The name is maths
The content is Algebra
The content is Geometry

edited Nov 5, 2014 at 12:17

answered Nov 5, 2014 at 11:37

Padraic Cunningham

181k30 gold badges264 silver badges327 bronze badges

2 Comments

Dhiwakar Ravikumar Over a year ago

Thanks @Padraic but having a regex based search is more robust right ? What if the content had ":" in its name ? :)

Padraic Cunningham Over a year ago

it would make no difference, we are only splitting when the line has the string name: in it

vks · Accepted Answer · 2014-11-05 11:29:17Z

0

(?<=:)(.*)$

This would be your regex1.See demo.

http://regex101.com/r/iZ9sO5/8

^(?!.*?:)(.*)$

This would be your regex2.See demo.

http://regex101.com/r/iZ9sO5/9

answered Nov 5, 2014 at 11:29

vks

68.1k11 gold badges96 silver badges132 bronze badges

2 Comments

Dhiwakar Ravikumar Over a year ago

Thanks. will check it out and let you know :) :)

Dhiwakar Ravikumar Over a year ago

File "create&&bkp.py", line 27 ret = re.search((?<=:)(.*)$,line) ^ SyntaxError: invalid syntax

Mauro Baraldi · Accepted Answer · 2014-11-05 12:04:07Z

0

else:
    #REGEX1
    ret = re.search(r'name:(.*)$',line) 
    if(ret != None):
        print('The  name is'+ret.group(1))
    else:
        #REGEX2
        # ret = re.search(r'(?P<content>)',line)
        print('The content is'+line))

edited Nov 5, 2014 at 12:04

Mauro Baraldi

6,6042 gold badges35 silver badges48 bronze badges

answered Nov 5, 2014 at 11:40

coo

1149 bronze badges

Collectives™ on Stack Overflow

Searching with Regex in Python

4 Answers 4

Comments

2 Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

2 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related