0

I have a text file which contain some format like :

PAGE(leave) 'Data1'
line 1
line 2 
line 2
...
...
...
PAGE(enter) 'Data1'

I need to get all the lines in between the two keywords and save it a text file. I have come across the following so far. But I have an issue with single quotes as regular expression thinks it as the quote in the expression rather than the keyword.

My codes so far:

log_file = open('messages','r')
    data = log_file.read()
    block = re.compile(ur'PAGE\(leave\) \'Data1\'[\S ]+\s((?:(?![^\n]+PAGE\(enter\) \'Data1\').)*)', re.IGNORECASE | re.DOTALL)
    data_in_home_block=re.findall(block, data)
    file = 0
    make_directory("home_to_home_data",1)
    for line in data_in_home_block:
        file = file + 1
        with open("home_to_home_" + str(file) , "a") as data_in_home_to_home:
            data_in_home_to_home.write(str(line))

It would be great if someone could guide me how to implement it..

2
  • so your file actually contains a backslash before the parenthesis? Like \(? Commented Dec 7, 2014 at 23:55
  • 1
    Why use regex at all if the keywords are not variable? Just look for them, get their locations in the text, then retrieve what's between. Commented Dec 7, 2014 at 23:55

2 Answers 2

1

As pointed out by @JoanCharmant, it is not necessary to use regex for this task, because the records are delimited by fixed strings.

Something like this should be enough:

messages = open('messages').read()

blocks = [block.rpartition(r"PAGE\(enter\) 'Data1'")[0]
          for block in messages.split(r"PAGE\(leave\) 'Data1'")
          if block and not block.isspace()]

for count, block in enumerate(blocks, 1):
    with open('home_to_home_%d' % count, 'a') as stream:
        stream.write(block)
Sign up to request clarification or add additional context in comments.

Comments

0

If it's single quotes what worry you, you can start the regular expression string with double quotes...

'hello "howdy"'  # Correct
"hello 'howdy'"  # Correct

Now, there are more issues here... Even when declared asr, you still must escape your regular expression's backslashes in the .compile (see What does the "r" in pythons re.compile(r' pattern flags') mean? ) Is just that without the r, you probably would need a lot more of backslashes.

I've created a test file with two "sections":

PAGE\(leave\) 'Data1'
line 1
line 2 
line 3
PAGE\(enter\) 'Data1'

PAGE\(leave\) 'Data1'
line 4
line 5 
line 6
PAGE\(enter\) 'Data1'

The code below will do what you want (I think)

import re

log_file = open('test.txt', 'r')
data = log_file.read()
log_file.close()
block = re.compile(
    ur"(PAGE\\\(leave\\\) 'Data1'\n)"
    "(.*?)"
    "(PAGE\\\(enter\\\) 'Data1')",
    re.IGNORECASE | re.DOTALL | re.MULTILINE
)
data_in_home_block = [result[1] for result in re.findall(block, data)]
for data_block in data_in_home_block:
    print "Found data_block: %s" % (data_block,)

Outputs:

Found data_block: line 1
line 2 
line 3

Found data_block: line 4
line 5 
line 6

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.