2

I am trying to strip some repeated text out of my Kindle clippings that look like this:

 The starting point,obviously,is a thorough analysis ofthe intellectual property portfolio,the contents ofwhich can be broadly divided into two categories:property that is in use and property that is not in use
 ==========
 Essentials of Licensing Intellectual Property (Alexander I. Poltorak, Paul J. Lerner)
 - Highlight on Page 25 | Added on Friday, 25 November 11 10:53:36 Greenwich Mean Time

 commentators (a euphemism for prolific writers with little experience
 ==========
 Essentials of Licensing Intellectual Property (Alexander I. Poltorak, Paul J. Lerner)
 - Highlight on Page 26 | Added on Friday, 25 November 11 10:54:29 Greenwich Mean Time

I am trying to strip out everthing between "Essentials" and "Time". The regexp I am playing with right now looks like this:

Essentials([^,]+)Time

But obviously it is not working:

http://rubular.com/r/gwSJFgOQai

Any help for this nuby would be massively appreciated!

1
  • 'Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.' .... what's going to happen if you have a book called "Time Management Essentials"? Commented Nov 29, 2011 at 6:40

3 Answers 3

7

You need the /m modifier which makes . match a newline:

/Essentials(.*?)Time/m

See it working here: http://rubular.com/r/qgmkWnLzW6

Sign up to request clarification or add additional context in comments.

2 Comments

Lol. I forgot to put m in mine. I used it in rubular same as you. Just forgot. :S +1
Thank you so much for taking the time to answer my problem.
3

Why don't you use this:

/Essentials(.*?)Time/m

Updated. Forgot the m for multiline.

Comments

1

Regex are powerful, but you'll find they also often add needless complexity to a problem.

This is how I'd go about the problem:

text = <<EOT
The starting point,obviously,is a thorough analysis ofthe intellectual property portfolio,the contents ofwhich can be broadly divided into two categories:property that is in use and property that is not in use
==========
Essentials of Licensing Intellectual Property (Alexander I. Poltorak, Paul J. Lerner)
- Highlight on Page 25 | Added on Friday, 25 November 11 10:53:36 Greenwich Mean Time

commentators (a euphemism for prolific writers with little experience
==========
Essentials of Licensing Intellectual Property (Alexander I. Poltorak, Paul J. Lerner)
- Highlight on Page 26 | Added on Friday, 25 November 11 10:54:29 Greenwich Mean Time
EOT

text.each_line do |l|
  l.chomp!
  next if ((l =~ /\AEssentials/) .. (l =~ /Time\z/))

  puts l
end

Which outputs:

The starting point,obviously,is a thorough analysis ofthe intellectual property portfolio,the contents ofwhich can be broadly divided into two categories:property that is in use and property that is not in use
==========

commentators (a euphemism for prolific writers with little experience
==========

This works because the .., AKA range operator, gains new capability when used with an if, and turns into what we call the flip-flop operator. In operation what happens is ((l =~ /\AEssentials/) .. (l =~ /Time\z/)) returns false, until (l =~ /\AEssentials/) matches. From then until (l =~ /Time\z/) matches it returns true. Once the final regex matches it returns to returning false.

This behavior works really well for extracting sections from text.

If you are aggregating text, for subsequent output, replace the puts l with something to append l to a buffer, then output that buffer at the end of your run.

3 Comments

+1 for the flipflop; but the first sentence warns against regexes while the code uses 2 of them.
I must admit I was a little confused about that. Thank you so much for the solution!
The code uses two very simple ones, which are much easier to maintain than one complex one. Regex complexity doesn't seem to increase arithmetically.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.