0

I'm not sure hwo best to word this, so I'll dive straight into an example.

a bunch of lines we don't care about [...]
This is a nice line I can look for
This is the string I wish to extract
a bunch more lines we do't care about [...]
This line contains an integer 12345 related to the string above
more garbage [...]

But sometimes (and I have no control over this) the order is swapped:

a bunch of lines we don't care about [...]
Here is another string I wish to extract
This is a nice line I can look for
a bunch more lines we do't care about [...]
This line contains an integer 67890 related to the string above
more garbage [...]

The two lines ("nice line" and "string I wish to extract") are always adjacent but the order is not predictable. The integer containing line is an inconsistent number of lines below. The "nice line" appears multiple times and is always the same and the string and integer I'm extracting (globally) may be the same or different from each other.

Ultimately the idea is to populate two lists, one containing the strings and the other containing the integers, both ordered as they are found so the two can later be used as key/value pairs.

What I have no idea how to do, or even if its possible, is to write a regex that finds the string immediately before OR after a target line???

Doing this in Python, btw.

Thoughts?

edit/addition: So what I'm expecting as a result out of the above sample text would be something like:

list1["This is the string I wish to extract", "Here is another string I wish to extract"]
list2[12345, 67890]
4
  • Do you need to generalize this to more than two out-of-order lines? Commented Nov 9, 2014 at 2:50
  • 1
    Can you give some concrete example of what you want to see? Commented Nov 9, 2014 at 3:42
  • @KarlKnechtel in this instance, no. Its only two. Commented Nov 9, 2014 at 8:21
  • Actually, I no longer require help with this, because in this particular instance I since noticed that my lines that I need to search for have something unique about them, making the job dead easy. Still, the quandary remains and an answer could be useful in the future. jp24's method is certainly adequate but if a more generalized regex (or other) method exists... well, I'll keep it open for a bit. Commented Nov 9, 2014 at 8:26

1 Answer 1

1

A good strategy might be to look for "nice lines" and then search the lines above and below.

See the following (untested) python psuedocode:

L1, L2 = [], []
lines = open("file.txt").readlines()
for i, line in enumerate(i, lines):
    if 'nice line' in line:
       before_line = lines[min(i-1, 0)]
       after_line = lines[min(i+1, len(lines) - 1)]
       # You can generalize the above to a few lines above and below

       # Use regex to parse information from `before_line` and `after_line`
       # and add it to the lists: L1, L2
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.