12

There is a list of words and list of banned words. I want to go through the word list and redact all the banned words. This is what I ended up doing (notice the catched boolean):

puts "Give input text:"
text = gets.chomp
puts "Give redacted word:"
redacted = gets.chomp

words = text.split(" ")
redacted = redacted.split(" ")
catched = false

words.each do |word|
  redacted.each do |redacted_word|
    if word == redacted_word
        catched = true
        print "REDACTED "
        break
    end
  end
    if catched == true
        catched = false
    else
        print word + " "
    end
end

Is there any proper/efficient way?

3 Answers 3

20

It also can works.

words - redacted

+, -, &, these methods are very simple and useful.

irb(main):016:0> words = ["a", "b", "a", "c"]
=> ["a", "b", "a", "c"]
irb(main):017:0> redacted = ["a", "b"]
=> ["a", "b"]
irb(main):018:0> words - redacted
=> ["c"]
irb(main):019:0> words + redacted
=> ["a", "b", "a", "c", "a", "b"]
irb(main):020:0> words & redacted
=> ["a", "b"]
Sign up to request clarification or add additional context in comments.

1 Comment

The only problem is that this isn't very flexible. If you needed to make it case-insensitive for example, you'd have to switch to one of the other solutions.
16

You can use .reject to exclude all banned words that are present in the redacted array:

words.reject {|w| redacted.include? w}

Demo

If you want to get the list of banned words that are present in the words array, you can use .select:

words.select {|w| redacted.include? w}

Demo

1 Comment

btw.. to side track. anyway to remove only the first occurrence?
1

This might be a bit more 'elegant'. Whether it's more or less efficient than your solution, I don't know.

puts "Give input text:"
original_text = gets.chomp
puts "Give redacted word:"
redacted = gets.chomp

redacted_words = redacted.split

print(
  redacted_words.inject(original_text) do |text, redacted_word|
    text.gsub(/\b#{redacted_word}\b/, 'REDACTED')
  end
)

So what's going on here?

  • I'm using String#split without an argument, because ' ' is the default, anyway.
  • With Array#inject, the following block (staring at do and ending at end is executed for each element in the array—in this case, our list of forbidden words.
    • In each round, the second argument to the block will be the respective element from the array
    • The first argument to the block will be the block's return value from the previous round. For the first round, the argument to the inject function (in our case original_text) will be used.
    • The block's return value from the last round will be used as return value of the inject function.
  • In the block, I replace all occurrences of the currently handled redacted word in the text.
    • String#gsub performs a global substitution
    • As the pattern to be substituted, I use a regexp literal (/.../). Except, it's not really a literal as I'm performing a string substitution (#{...}) on it to get the currently handled redacted word into it.
    • In the regexp, I'm surrounding the word to be redacted with \b word boundary matchers. They match the boundary between alphanumeric and non-alphanumeric characters (or vice verca), without matching any of the characters themselves. (They match the zero-lenght 'position' between the characters.) If a string starts or ends with alphanumeric characters, \b will also match the start or end of the string, respectively, so that we can use it to match whole words.
  • The result of inject (which is the result of the last execution of the block, i.e., the text when all the substitutions have taken place) is passed as an argument to print, which will output the now redacted text.

Note that, other than your solution, mine will not consider punctuation as parts of adjacent words.

Also note that my solution will be vulnerable to regex injection.

Example 1:

Give input text:
A fnord is a fnord.
Give redacted word:
ford fnord foo

My output:

A REDACTED is a REDACTED.

Your output:

A REDACTED is a fnord.

Example 2:

Give input text:
A fnord is a fnord.
Give redacted word:
fnord.

My output:

A REDACTEDis a fnord.

(Note how the . was interpreted to match any character.)

Your output:

A fnord is a REDACTED.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.