5

Iam using python regex to extract certain values from a given string. This is my string:

mystring.txt

sometext
somemore    text here

some  other text

              course: course1
Id              Name                marks
____________________________________________________
1               student1            65
2               student2            75
3               MyName              69
4               student4            43

              course: course2
Id              Name                marks
____________________________________________________
1               student1            84
2               student2            73
8               student7            99
4               student4            32

              course: course4
Id              Name                marks
____________________________________________________
1               student1            97
3               MyName              60
8               student6            82

and I need to extract the course name and corresponding marks for a particular student. For example, I need the course and marks for MyName from the above string.

I tried:

re.findall(".*?course: (\w+).*?MyName\s+(\d+).*?",buff,re.DOTALL)

But this works only if MyName is present under each course, but not if MyName is missing in some of the course, like in my example string.

Here I get output as: [('course1', '69'), ('course2', '60')]

but what actually what I want to achive is: [('course1', '69'), ('course4', '60')]

what would be the correct regex for this?

#!/usr/bin/python    
import re

buffer_fp = open("mystring.txt","r+")
buff = buffer_fp.read()
buffer_fp.close()
print re.findall(".*?course: (\w+).*?MyName\s+(\d+).*?",buff,re.DOTALL)

2 Answers 2

5
.*?course: (\w+)(?:(?!\bcourse\b).)*MyName\s+(\d+).*?

                    ^^^^^^^^^^^^

You can try this.See demo.Just use a lookahead based quantifier which will search for MyName before a course just before it.

https://regex101.com/r/pG1kU1/26

Sign up to request clarification or add additional context in comments.

5 Comments

what are the g,s and flags? I understand s is equivalant to re.DOTALL. I thought g is for findall, but then using this regex in python code is giving a different output
But re.findall(".*?course: (\w+)(?:(?!\bcourse\b).)*MyName\s+(\d+).*?",buff,re.DOTALL) outputs : [('course1', '60')] :(
@Deepa print re.findall(r".*?course: (\w+)(?:(?!\bcourse\b).)*MyName\s+(\d+).*?",x,flags=re.DOTALL) i have tried this code and its working for me
oops.. sorry I have been trying without the r prefix. Dont get what difference that makes. :) r prefix is for not translating escapes right?wonder why did that affect the op
@Deepa python interprets \b as its.own.bell.or.so...but we want it.to.be.word.boundary....so.we have to.use r
2

I suspect this is impossible to do in a single regular expression. They are not all-powerful.

Even if you find a way, don't do this. Your non-working regex is already close to unreadable; a working solution is likely to be even more so. You can most likely do this in just a few lines of meaningful code. Pseudocode solution:

for line in buff:
    if it is a course line:
        set the course variable
    if it is a MyName line:
        add (course, marks) to the list of matches

Note that this could (and probably should) involve regexes in each of those if blocks. It's not a case of choosing between the hammer and the screwdriver to the exclusion of the other, but rather using them both for what they do best.

3 Comments

Guess you underestimated regex :)
@vks I guess I did. But respectfully, your solution proves my point. That regex is illegible garbage - good luck to the OP if their requirements ever change and they need to try to fix it. It reads more like Perl than Python.
it's illegible garbage for the one who can't understand :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.