1

I have a python file as

test.py 

import os
class test():

    def __init__(self):
        pass

    def add(num1, num2):
        return num1+num2

I am reading this file in a string as :

with open('test.py', 'r') as myfile:
    data=myfile.read()

print data

Now, my data contains the string with all lines and new lines. I need to find lines with start of class and def.

for example:

I need the output to be printed as :

class test():
def __init__(self):
def add(num1, num2):

How can I process this using regular expressions?

11
  • What do you mean process? Commented Aug 4, 2016 at 9:02
  • I need to process the string data to get the output as shown Commented Aug 4, 2016 at 9:04
  • @GáborErdős I believe he means Regex. OP: What's your motivation? Commented Aug 4, 2016 at 9:04
  • I can parse the file line by line but its not an efficient way. Its better that I will get the data in a string and will use regex on it to find the lines. Commented Aug 4, 2016 at 9:05
  • 1
    Use re.findall(r'(?m)^[ \t]*((?:class|def) .*)', data). Regex demo. Commented Aug 4, 2016 at 9:16

3 Answers 3

2

If you want to follow a regex approach, use

re.findall(r'(?m)^[ \t]*((?:class|def)[ \t].*)', data)

or

re.findall(r'^[ \t]*((?:class|def)[ \t].*)', data, flags=re.M)

See regex demo

The point is that you should use ^ as the beginning of the line anchor (hence, (?m) at the start or re.M flag are necessary), then you match horizontal whitespaces (with [ \t]), then either class or def (with (?:class|def)), and then again a space or tab and then 0+ chars other than a newline (.*).

If you plan to also handle Unicode whitespace, you need to replace [ \t] with [^\S\r\n\f\v] (and use the re.UNICODE flag).

Python demo:

import re
p = re.compile(r'^[ \t]*((?:class|def)[ \t].*)', re.MULTILINE)
s = "test.py \n\nimport os\nclass test():\n\n    def __init__(self):\n        pass\n\n    def add(num1, num2):\n        return num1+num2"
print(p.findall(s))
# => ['class test():', 'def __init__(self):', 'def add(num1, num2):']
Sign up to request clarification or add additional context in comments.

4 Comments

Can't you just use \s for the whitespaces?
No, \s matches a newline, but data is the whole file containing newlines as well. Perhaps, it is ok, but I do not have more input data to test it out.
Well, I think you could use \s+? for non-greedy match and ^...$ to restrict it to one line.
The $ is redundant since .* will match up to the line end. The ^ is used in my pattern. As for lazy quantifier with \s, there is really no need of it at all, it can be greedy.
2

So if you need to find all def and class lines it is much easier to avoid regex.

You read the whole content of the file here

with open('test.py', 'r') as myfile:
    data=myfile.read()

print data

Why don't you just find the answer right there?

with open('test.py', 'r') as myfile:
    for line in myfile:
        stripped = line.strip()  # get rid of spaces left and right
        if stripped.startswith('def') or stripped.startswith('class'):
             print(line)

To work with a whole string as you requested:

import re
with open('test.py', 'r') as myfile:
    data = myfile.read()

print(data)

print(re.findall("class.+\n|def.+\n",data))

As you can see from the comments this will match ''definied as bla bla' as well. So it is better to use

print(re.findall("class .+\n|def .+\n",data))

9 Comments

I dont want to search line by line. My intention is to search inside a complete string of file.
@sam I'm sorry, but this is exactly what it is doing. Reads a file, rolls through it, yields the results you're searching for.
@sam I also added a version with regex from a full string if you really what that
The "class.+\n|def.+\n" regex will match defined here in s = "sometext" // s defined here\n (see demo) and "class .+\n|def .+\n" will find def here in s = "sometext" // my cool def here\n (demo).
1) You can use startswith(("def", "class")), 2) You should add a space behind both; 3) same for regex, and it does not even check whether it's at the start of the line.
|
1
with open('test.py', 'r') as myfile:
    data=myfile.read().split('\n')
    for line in data:
        if re.search("(\s+)?class ", line) or re.search("^\s+def ", line):
            print line

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.