Finding data in-between two strings in python

Question

I have a text file which contain some format like :

PAGE(leave) 'Data1'
line 1
line 2 
line 2
...
...
...
PAGE(enter) 'Data1'

I need to get all the lines in between the two keywords and save it a text file. I have come across the following so far. But I have an issue with single quotes as regular expression thinks it as the quote in the expression rather than the keyword.

My codes so far:

log_file = open('messages','r')
    data = log_file.read()
    block = re.compile(ur'PAGE\(leave\) \'Data1\'[\S ]+\s((?:(?![^\n]+PAGE\(enter\) \'Data1\').)*)', re.IGNORECASE | re.DOTALL)
    data_in_home_block=re.findall(block, data)
    file = 0
    make_directory("home_to_home_data",1)
    for line in data_in_home_block:
        file = file + 1
        with open("home_to_home_" + str(file) , "a") as data_in_home_to_home:
            data_in_home_to_home.write(str(line))

It would be great if someone could guide me how to implement it..

so your file actually contains a backslash before the parenthesis? Like \(? — Savir
– Savir, Commented Dec 7, 2014 at 23:55
Why use regex at all if the keywords are not variable? Just look for them, get their locations in the text, then retrieve what's between. — Joan Charmant
– Joan Charmant, Commented Dec 7, 2014 at 23:55

ekhumoro · Accepted Answer · 2014-12-08 01:28:28Z

1

As pointed out by @JoanCharmant, it is not necessary to use regex for this task, because the records are delimited by fixed strings.

Something like this should be enough:

messages = open('messages').read()

blocks = [block.rpartition(r"PAGE\(enter\) 'Data1'")[0]
          for block in messages.split(r"PAGE\(leave\) 'Data1'")
          if block and not block.isspace()]

for count, block in enumerate(blocks, 1):
    with open('home_to_home_%d' % count, 'a') as stream:
        stream.write(block)

answered Dec 8, 2014 at 1:28

ekhumoro

122k23 gold badges272 silver badges400 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Community · Accepted Answer · 2017-05-23 11:57:43Z

If it's single quotes what worry you, you can start the regular expression string with double quotes...

'hello "howdy"'  # Correct
"hello 'howdy'"  # Correct

Now, there are more issues here... Even when declared asr, you still must escape your regular expression's backslashes in the .compile (see What does the "r" in pythons re.compile(r' pattern flags') mean? ) Is just that without the r, you probably would need a lot more of backslashes.

I've created a test file with two "sections":

PAGE\(leave\) 'Data1'
line 1
line 2 
line 3
PAGE\(enter\) 'Data1'

PAGE\(leave\) 'Data1'
line 4
line 5 
line 6
PAGE\(enter\) 'Data1'

The code below will do what you want (I think)

import re

log_file = open('test.txt', 'r')
data = log_file.read()
log_file.close()
block = re.compile(
    ur"(PAGE\\\(leave\\\) 'Data1'\n)"
    "(.*?)"
    "(PAGE\\\(enter\\\) 'Data1')",
    re.IGNORECASE | re.DOTALL | re.MULTILINE
)
data_in_home_block = [result[1] for result in re.findall(block, data)]
for data_block in data_in_home_block:
    print "Found data_block: %s" % (data_block,)

Outputs:

Found data_block: line 1
line 2 
line 3

Found data_block: line 4
line 5 
line 6

Collectives™ on Stack Overflow

Finding data in-between two strings in python

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related