extracting the line with a string from a file using python

Question

Team,

I want to extract some lines using a string(starts with tg_) from a file and i get the output as per below regex..the question is,

I am not sure how to extract the line if goes for 2 lines ends with \ like below.
I don't know how to remove the special characters with the below existing below regexp.

*****from a file*******

tg_cr_counters dghbvcvgfv

tg_kk_bb a group1 bye bye bye hi hi hi 1 \ <<<<
patch mac hdfh f dgf asadasf \
dgfgmnhnjgfg

tg_cr_counters gthghtrhgh }} ] <<<<<

tg_cr_counters fkgnfkmngvd

import re

file = open("C:\\Users\\input.tcl", "r")
f1 = file.readlines()

output = open("extract.txt", "a+")

match_list = [ ]   

for item in f1:

    match_list = re.findall(r'[t][g][_]+\w+.*', item)
    if(len(match_list)>0):
        output.write(match_list[0]+"\r\n")
        print(match_list)

Can we assume that when there is a single newline, then the line is continuing, and that if there are two consecutive newlines it is not? I'm not clear on what you want to extract. — user2201041
– user2201041, Commented Nov 28, 2018 at 17:07
@JETM the line which i want to grep is a multiline ends with "\" , when i use regex it extracts only the first line which is ending with \, not second and third line — Charles Daniel
– Charles Daniel, Commented Nov 30, 2018 at 5:45

Patrick Artner · Accepted Answer · 2018-11-28 17:32:35Z

1

You can use regex with flags for re.MULTILINE and re.DOTALL.

This way a . will also match \n and you can look for anything that starts with tg_ (no need to put each in []) and ends with a double \n\n (or end of text) \Z:

fn = "t.txt"
with open (fn,"w") as f: 
    f.write("""*****from a file*******

tg_cr_counters dghbvcvgfv

tg_kk_bb a group1 bye bye bye hi hi hi 1 \ <<<<
patch mac hdfh f dgf asadasf \
dgfgmnhnjgfg

tg_cr_counters gthghtrhgh }} ] <<<<<

tg_cr_counters fkgnfkmngvd
""")

import re

with open("extract.txt", "a+") as o, open(fn) as f:
    for m in re.findall(r'^tg_.*?(?:\n\n|\Z)', f.read(), flags=re.M|re.S):
        o.write("-"*40+"\r\n")
        o.write(m)
        o.write("-"*40+"\r\n")

with open("extract.txt")as f:
    print(f.read())

Output (each match is between a line of ----------------------------------------):

----------------------------------------
tg_cr_counters dghbvcvgfv

----------------------------------------
----------------------------------------
tg_kk_bb a group1 bye bye bye hi hi hi 1 \ <<<<
patch mac hdfh f dgf asadasf dgfgmnhnjgfg

----------------------------------------
----------------------------------------
tg_cr_counters gthghtrhgh }} ] <<<<<

----------------------------------------
----------------------------------------
tg_cr_counters fkgnfkmngvd
----------------------------------------

re.findall() result looks like:

['tg_cr_counters dghbvcvgfv\n\n', 
 'tg_kk_bb a group1 bye bye bye hi hi hi 1 \\ <<<<\npatch mac hdfh f dgf asadasf dgfgmnhnjgfg\n\n', 
 'tg_cr_counters gthghtrhgh }} ] <<<<<\n\n', 
 'tg_cr_counters fkgnfkmngvd\n']

To enable multiline-searches you need to read in more then one line at a time - if your file is humongeous this will lead to memory problems.

edited Nov 28, 2018 at 17:32

answered Nov 28, 2018 at 17:26

Patrick Artner

51.9k10 gold badges50 silver badges79 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Charles Daniel Over a year ago

thanks..but you are trying to extract form the output, it is there to extract from the input file where multilines are present starting with tg_

Patrick Artner Over a year ago

@CharlesDaniel I am creating a file that holds all the lines on top of the code. Then I am extracting from the whole file ` f.read()` the lines you see between -------------------- - they are multiline matches. I don't quite get what you imply... t.txt holds 4 multilines starting with tg_ and a line in front with '*****from a file*******' in it that is not captured because it does not start with tg_

Collectives™ on Stack Overflow

extracting the line with a string from a file using python

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related