0

For a string like

s = "(string1) this is text (string2) that's separated (string3)"

I need a way to remove all the parenthesis and text in them, however if I use the following it'll return an empty string

s.gsub(/\(.*\)/, "")

What can I use to get the following?

" this is text  that's separated "

3 Answers 3

4

You could do the following:

s.gsub(/\(.*?\)/,'')
  # => " this is text  that's separated "

The ? in the regex is to make it "non-greedy". Without it, if:

s = "A: (string1) this is text (string2) that's separated (string3) B"

then

s.gsub(/\(.*\)/,'')
  #=> "A:  B" 

Edit: I ran the following benchmarks for various methods. You will see that there is one important take-away.

n = 10_000_000
s = "(string1) this is text (string2) that's separated (string3)"

Benchmark.bm do |bm|
  bm.report 'sawa' do
    n.times { s.gsub(/\([^()]*\)/,'') }
  end 
  bm.report 'cary' do
    n.times { s.gsub(/\(.*?\)/,'') }
  end 
  bm.report 'cary1' do
    n.times { s.split(/\(.*?\)/).join }
  end 
  bm.report 'sawa1' do
    n.times { s.split(/\([^()]*\)/).join }
  end 
  bm.report 'sawa!' do
    n.times { s.gsub!(/\([^()]*\)/,'') }
  end
  bm.report '' do
    n.times { s.gsub(/\([\w\s]*\)/, '') }
  end
end

              user     system      total        real
sawa        37.110000   0.070000  37.180000 ( 37.182598)
cary        37.000000   0.060000  37.060000 ( 37.066398)
cary1       35.960000   0.050000  36.010000 ( 36.009534)
sawa1       36.450000   0.050000  36.500000 ( 36.503711)
sawa!        7.630000   0.000000   7.630000 (  7.632278)
user1179871 38.500000   0.150000  38.650000 ( 38.666955)

I ran the benchmark several times and the results varied a fair bit. In some cases sawa was slightly faster than cary.

[Edit: I added a modified version of @user1179871's method to the benchmark above, but did not change any of the text of my answer. The modification is described in a comment on @user1179871's answer. It looks to be slightly slower that sawa and cary, but that may not be the case, as the benchmark times vary from run-to-run, and I did a separate benchmark of the new method.

Sign up to request clarification or add additional context in comments.

2 Comments

What you are testing under sawa-variants are your regexes.
@MarcoPrins, good point. sawa and I both went by the statement of the question and didn't notice the last line was inconsistent with that.
2

Cary's answer is the simple way. This answer is the efficient way.

s.gsub(/\([^()]*\)/, "")

To keep in mind: Non-greedy matching requires backtracking, and in general, it is better not use it if you can. But for such simple task, Cary's answer is good enough.

3 Comments

You're right: I ran the example string 10 million times. sawa: 36.04 seconds, cary: 36.69 seconds, cary1: 35.98 seconds. cary1 is s.split(/\(.*?\)/).join (which I expected to come in last).
@CarySwoveland Then it must be gsub that is taking time. What happens if you try my regex with split and join? Or, just changing gsub to gsub! would improve the performance.
I edited my answer to report the results of an expanded benchmark.
0

Try it

string.gsub(/\({1}\w*\){1}/, '')

2 Comments

This is the same as string.gsub(/\(\w*\)/, ''); that is, {1} has no effect. Unfortunately, it doesn't work for string = "some say (string 1) is text but (string 2) is not."; string.gsub(/({1}\w*){1}/, '') #=> "some say (string 1) is text but (string 2) is not."` The problem is the spaces between the parens. Changing the regex to permit spaces seems to work: string.gsub(/\([\w\s]*\)/, '') #=> "some say is text but is not." I'll add that to the benchmark
It occurred to me that there still is a problem, in that your regex does not expect characters sometimes found in words (e.g., "what's", "twenty-two").

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.