python: extracting (regex) pattern in a file without going through line by line (multiline search)

Question

I can extract a particualr pattern by reading mystring.txt file line by line and checking the line against re.search(r'pattern',line_txt) method.

Following is the mystring.txt

`

Client: //home/SCM/dev/applications/build_system/test_suite_linux/unit_testing



Stream: //MainStream/testing_branch

Options:    dir, norm accel, ddl



SubmitOptions:  vis, dir, cas, cat

`

using python, I can get the stream name as //MainStream/testing_branch

import re 
with open("mystring.txt",'r') as f:
    mystring= f.readlines()
    for line in mystring:
        if re.search(r'^Stream\:',line):

            stream_name = line.split('\t')[1]
            print stream_name

instead of going line by line in a loop, how is it possible to extract the same information by only using the re module?

Use f.read() for the whole buffer. Then try a re.search()

Marcel Wilson
– Marcel Wilson

2016-05-19 17:54:28 +00:00
Commented May 19, 2016 at 17:54 — Marcel Wilson
– Marcel Wilson, Commented May 19, 2016 at 17:54

rock321987 · Accepted Answer · 2016-05-19 17:55:29Z

4

You can read the file in one go and use re.findall(beware if the file is too large, loading it to main memory will not be good idea)

import re
content = open("input_file").read()
print(re.findall("^Stream: (.*)", content, re.M))

answered May 19, 2016 at 17:55

rock321987

11.1k1 gold badge34 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Johnny Utahh Over a year ago

fyi. More depth on re.search() vs re.findall() here: stackoverflow.com/a/37330608/605356

UltraInstinct · Accepted Answer · 2016-05-19 17:49:31Z

2

Yes, you can use: re.MULTILINE with re.search(..).

>>> import re
>>> re.search(r'^Stream\:\s([^\n]+)', f.read(), re.MULTILINE).group(1)
'//MainStream/testing_branch'

answered May 19, 2016 at 17:49

UltraInstinct

44.6k12 gold badges85 silver badges108 bronze badges

4 Comments

Sha Over a year ago

This is what i was looking for. Could you please explain what does ([^\n]+) do? Thanks

rkatkam Over a year ago

^ implies start where as $ implies end of string. ([^\n]) means starting with newline character.

UltraInstinct Over a year ago

[^\n]+ means grab each character after Stream:<whitespace> that is not a newline. In other words, all character up to the newline character

Johnny Utahh Over a year ago

fyi. Remove the .group(1) at the end of the re.search() call to return a "was there a match" boolean. Also: more depth on re.search() vs re.findall() here: stackoverflow.com/a/25565090/605356

nainometer · Accepted Answer · 2020-07-29 02:42:26Z

0

Here is the solution

f = open("mystring.txt").read()

import re

got = re.findall("Stream: .+\n", f)

got = got[0].strip()

print(got.split(": ")[1])

edited Jul 29, 2020 at 2:42

nainometer

4744 silver badges17 bronze badges

answered May 19, 2016 at 17:59

Ishaq Khan

1732 silver badges10 bronze badges

Collectives™ on Stack Overflow

python: extracting (regex) pattern in a file without going through line by line (multiline search)

3 Answers 3

1 Comment

4 Comments

Here is the solution

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

4 Comments

Here is the solution

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related