2

I have a string:

story = 'A long foo ago, in a foo bar baz, baz away...foobar'

I also have matches from this string (the dictionary is dynamic, it doesn't depend on me)

string_matches = ['foo', 'foo', 'bar', 'baz', 'baz', 'foobar'] # words can be repeated

How to replace each match with **foo**? to get a result:

story = 'A long **foo** ago, in a **foo** **bar** **baz**, **baz** away...**foobar**'

for example my code:

string_matches.each do |word|
  story.gsub!(/#{word}/, "**#{word}**")
end

returned:

"A long ****foo**** ago, in a ****foo**** **bar** ****baz****, ****baz**** away...****foo******bar**"
0

2 Answers 2

4

If you need to check if the words are matched as whole words, you may use

story.gsub(/\b(?:#{Regexp.union(string_matches.uniq.sort { |a,b| b.length <=> a.length }).source})\b/, '**\0**')

If the whole word check is not necessary use

story.gsub(Regexp.union(string_matches.uniq.sort { |a,b| b.length <=> a.length }), '**\0**')

See the Ruby demo

Details

  • \b - a word boundary
  • (?:#{Regexp.union(string_matches.uniq.sort { |a,b| b.length <=> a.length }).source}) - this creates a pattern like (?:foobar|foo|bar|baz) that matches a single word from the deduplicated list of keywords, and sorts them by length in the descending order. See Order of regular expression operator (..|.. ... ..|..) why this is necessary.
  • \b - a word boundary

The \0 in the replacement pattern is the replacement backreference referring to the whole match.

Sign up to request clarification or add additional context in comments.

2 Comments

The interpolation on the 2nd one is superfluous. You can just use story.gsub(Regexp.union(...), '**\0**')
I would also use .sort_by(&:length).reverse instead of .sort { |a, b| b.length <=> a.length }. Which is in my opinion more clean and expressive. It is just personal preference and I'd understand if you leave the answer as is.
0

A slight change will nearly get you there:

irb(main):001:0> string_matches.uniq.each { |word| story.gsub!(/#{word}/, "**#{word}**") }
=> ["foo", "bar", "baz", "foobar"]
irb(main):002:0> story
=> "A long **foo** ago, in a **foo** **bar** **baz**, **baz** away...**foo****bar**"

The trouble with the final part of the resulting string is that foobar has been matched by both foo and foobar.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.