3

I can extract a particualr pattern by reading mystring.txt file line by line and checking the line against re.search(r'pattern',line_txt) method.

Following is the mystring.txt

`

Client: //home/SCM/dev/applications/build_system/test_suite_linux/unit_testing



Stream: //MainStream/testing_branch

Options:    dir, norm accel, ddl



SubmitOptions:  vis, dir, cas, cat

`

using python, I can get the stream name as //MainStream/testing_branch

import re 
with open("mystring.txt",'r') as f:
    mystring= f.readlines()
    for line in mystring:
        if re.search(r'^Stream\:',line):

            stream_name = line.split('\t')[1]
            print stream_name

instead of going line by line in a loop, how is it possible to extract the same information by only using the re module?

1
  • Use f.read() for the whole buffer. Then try a re.search() Commented May 19, 2016 at 17:54

3 Answers 3

4

You can read the file in one go and use re.findall(beware if the file is too large, loading it to main memory will not be good idea)

import re
content = open("input_file").read()
print(re.findall("^Stream: (.*)", content, re.M))
Sign up to request clarification or add additional context in comments.

1 Comment

fyi. More depth on re.search() vs re.findall() here: stackoverflow.com/a/37330608/605356
2

Yes, you can use: re.MULTILINE with re.search(..).

>>> import re
>>> re.search(r'^Stream\:\s([^\n]+)', f.read(), re.MULTILINE).group(1)
'//MainStream/testing_branch'

4 Comments

This is what i was looking for. Could you please explain what does ([^\n]+) do? Thanks
^ implies start where as $ implies end of string. ([^\n]) means starting with newline character.
[^\n]+ means grab each character after Stream:<whitespace> that is not a newline. In other words, all character up to the newline character
fyi. Remove the .group(1) at the end of the re.search() call to return a "was there a match" boolean. Also: more depth on re.search() vs re.findall() here: stackoverflow.com/a/25565090/605356
0

Here is the solution

f = open("mystring.txt").read()

import re

got = re.findall("Stream: .+\n", f)

got = got[0].strip()

print(got.split(": ")[1])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.