Ruby how to remove repeated regex in string

Question

For a string like

s = "(string1) this is text (string2) that's separated (string3)"

I need a way to remove all the parenthesis and text in them, however if I use the following it'll return an empty string

s.gsub(/\(.*\)/, "")

What can I use to get the following?

" this is text  that's separated "

Cary Swoveland · Accepted Answer · 2015-01-25 08:10:49Z

4

You could do the following:

s.gsub(/\(.*?\)/,'')
  # => " this is text  that's separated "

The ? in the regex is to make it "non-greedy". Without it, if:

s = "A: (string1) this is text (string2) that's separated (string3) B"

then

s.gsub(/\(.*\)/,'')
  #=> "A:  B"

Edit: I ran the following benchmarks for various methods. You will see that there is one important take-away.

n = 10_000_000
s = "(string1) this is text (string2) that's separated (string3)"

Benchmark.bm do |bm|
  bm.report 'sawa' do
    n.times { s.gsub(/\([^()]*\)/,'') }
  end 
  bm.report 'cary' do
    n.times { s.gsub(/\(.*?\)/,'') }
  end 
  bm.report 'cary1' do
    n.times { s.split(/\(.*?\)/).join }
  end 
  bm.report 'sawa1' do
    n.times { s.split(/\([^()]*\)/).join }
  end 
  bm.report 'sawa!' do
    n.times { s.gsub!(/\([^()]*\)/,'') }
  end
  bm.report '' do
    n.times { s.gsub(/\([\w\s]*\)/, '') }
  end
end

              user     system      total        real
sawa        37.110000   0.070000  37.180000 ( 37.182598)
cary        37.000000   0.060000  37.060000 ( 37.066398)
cary1       35.960000   0.050000  36.010000 ( 36.009534)
sawa1       36.450000   0.050000  36.500000 ( 36.503711)
sawa!        7.630000   0.000000   7.630000 (  7.632278)
user1179871 38.500000   0.150000  38.650000 ( 38.666955)

I ran the benchmark several times and the results varied a fair bit. In some cases sawa was slightly faster than cary.

[Edit: I added a modified version of @user1179871's method to the benchmark above, but did not change any of the text of my answer. The modification is described in a comment on @user1179871's answer. It looks to be slightly slower that sawa and cary, but that may not be the case, as the benchmark times vary from run-to-run, and I did a separate benchmark of the new method.

edited Jan 25, 2015 at 8:10

answered Jan 23, 2015 at 3:52

Cary Swoveland

111k6 gold badges69 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

sawa Over a year ago

What you are testing under sawa-variants are your regexes.

Cary Swoveland Over a year ago

@MarcoPrins, good point. sawa and I both went by the statement of the question and didn't notice the last line was inconsistent with that.

sawa · Accepted Answer · 2015-01-23 08:10:41Z

2

Cary's answer is the simple way. This answer is the efficient way.

s.gsub(/\([^()]*\)/, "")

To keep in mind: Non-greedy matching requires backtracking, and in general, it is better not use it if you can. But for such simple task, Cary's answer is good enough.

edited Jan 23, 2015 at 8:10

answered Jan 23, 2015 at 3:56

sawa

169k51 gold badges287 silver badges398 bronze badges

3 Comments

Cary Swoveland Over a year ago

You're right: I ran the example string 10 million times. sawa: 36.04 seconds, cary: 36.69 seconds, cary1: 35.98 seconds. cary1 is s.split(/\(.*?\)/).join (which I expected to come in last).

sawa Over a year ago

@CarySwoveland Then it must be gsub that is taking time. What happens if you try my regex with split and join? Or, just changing gsub to gsub! would improve the performance.

Cary Swoveland Over a year ago

I edited my answer to report the results of an expanded benchmark.

user1179871 · Accepted Answer · 2015-01-23 15:45:22Z

0

Try it

string.gsub(/\({1}\w*\){1}/, '')

answered Jan 23, 2015 at 15:45

user1179871

661 silver badge2 bronze badges

2 Comments

Cary Swoveland Over a year ago

This is the same as string.gsub(/\(\w*\)/, ''); that is, {1} has no effect. Unfortunately, it doesn't work for string = "some say (string 1) is text but (string 2) is not."; string.gsub(/({1}\w*){1}/, '') #=> "some say (string 1) is text but (string 2) is not."` The problem is the spaces between the parens. Changing the regex to permit spaces seems to work: string.gsub(/\([\w\s]*\)/, '') #=> "some say is text but is not." I'll add that to the benchmark

Cary Swoveland Over a year ago

It occurred to me that there still is a problem, in that your regex does not expect characters sometimes found in words (e.g., "what's", "twenty-two").

Collectives™ on Stack Overflow

Ruby how to remove repeated regex in string

3 Answers 3

2 Comments

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related