parsing python file with re

Question

I have a python file as

test.py 

import os
class test():

    def __init__(self):
        pass

    def add(num1, num2):
        return num1+num2

I am reading this file in a string as :

with open('test.py', 'r') as myfile:
    data=myfile.read()

print data

Now, my data contains the string with all lines and new lines. I need to find lines with start of class and def.

for example:

I need the output to be printed as :

class test():
def __init__(self):
def add(num1, num2):

How can I process this using regular expressions?

I need to process the string data to get the output as shown — sam
– sam, Commented Aug 4, 2016 at 9:04
@GáborErdős I believe he means Regex. OP: What's your motivation? — Mio Bambino
– Mio Bambino, Commented Aug 4, 2016 at 9:04
I can parse the file line by line but its not an efficient way. Its better that I will get the data in a string and will use regex on it to find the lines. — sam
– sam, Commented Aug 4, 2016 at 9:05
Use re.findall(r'(?m)^[ \t]*((?:class|def) .*)', data). Regex demo. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Aug 4, 2016 at 9:16

Wiktor Stribiżew · Accepted Answer · 2016-08-04 09:27:29Z

2

If you want to follow a regex approach, use

re.findall(r'(?m)^[ \t]*((?:class|def)[ \t].*)', data)

or

re.findall(r'^[ \t]*((?:class|def)[ \t].*)', data, flags=re.M)

See regex demo

The point is that you should use ^ as the beginning of the line anchor (hence, (?m) at the start or re.M flag are necessary), then you match horizontal whitespaces (with [ \t]), then either class or def (with (?:class|def)), and then again a space or tab and then 0+ chars other than a newline (.*).

If you plan to also handle Unicode whitespace, you need to replace [ \t] with [^\S\r\n\f\v] (and use the re.UNICODE flag).

Python demo:

import re
p = re.compile(r'^[ \t]*((?:class|def)[ \t].*)', re.MULTILINE)
s = "test.py \n\nimport os\nclass test():\n\n    def __init__(self):\n        pass\n\n    def add(num1, num2):\n        return num1+num2"
print(p.findall(s))
# => ['class test():', 'def __init__(self):', 'def add(num1, num2):']

answered Aug 4, 2016 at 9:27

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

tobias_k Over a year ago

Can't you just use \s for the whitespaces?

Wiktor Stribiżew Over a year ago

No, \s matches a newline, but data is the whole file containing newlines as well. Perhaps, it is ok, but I do not have more input data to test it out.

tobias_k Over a year ago

Well, I think you could use \s+? for non-greedy match and ^...$ to restrict it to one line.

Wiktor Stribiżew Over a year ago

The $ is redundant since .* will match up to the line end. The ^ is used in my pattern. As for lazy quantifier with \s, there is really no need of it at all, it can be greedy.

Gábor Erdős · Accepted Answer · 2016-08-04 09:20:13Z

2

So if you need to find all def and class lines it is much easier to avoid regex.

You read the whole content of the file here

with open('test.py', 'r') as myfile:
    data=myfile.read()

print data

Why don't you just find the answer right there?

with open('test.py', 'r') as myfile:
    for line in myfile:
        stripped = line.strip()  # get rid of spaces left and right
        if stripped.startswith('def') or stripped.startswith('class'):
             print(line)

To work with a whole string as you requested:

import re
with open('test.py', 'r') as myfile:
    data = myfile.read()

print(data)

print(re.findall("class.+\n|def.+\n",data))

As you can see from the comments this will match ''definied as bla bla' as well. So it is better to use

print(re.findall("class .+\n|def .+\n",data))

edited Aug 4, 2016 at 9:20

answered Aug 4, 2016 at 9:10

Gábor Erdős

3,6894 gold badges28 silver badges62 bronze badges

9 Comments

sam Over a year ago

I dont want to search line by line. My intention is to search inside a complete string of file.

Mio Bambino Over a year ago

@sam I'm sorry, but this is exactly what it is doing. Reads a file, rolls through it, yields the results you're searching for.

Gábor Erdős Over a year ago

@sam I also added a version with regex from a full string if you really what that

Wiktor Stribiżew Over a year ago

The "class.+\n|def.+\n" regex will match defined here in s = "sometext" // s defined here\n (see demo) and "class .+\n|def .+\n" will find def here in s = "sometext" // my cool def here\n (demo).

tobias_k Over a year ago

1) You can use startswith(("def", "class")), 2) You should add a space behind both; 3) same for regex, and it does not even check whether it's at the start of the line.

|

zenofsahil · Accepted Answer · 2016-08-04 09:21:32Z

1

with open('test.py', 'r') as myfile:
    data=myfile.read().split('\n')
    for line in data:
        if re.search("(\s+)?class ", line) or re.search("^\s+def ", line):
            print line

edited Aug 4, 2016 at 9:21

answered Aug 4, 2016 at 9:18

zenofsahil

1,7632 gold badges16 silver badges19 bronze badges

Collectives™ on Stack Overflow

parsing python file with re

3 Answers 3

4 Comments

9 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

9 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related