Remove array elements that are present in another array

Question

There is a list of words and list of banned words. I want to go through the word list and redact all the banned words. This is what I ended up doing (notice the catched boolean):

puts "Give input text:"
text = gets.chomp
puts "Give redacted word:"
redacted = gets.chomp

words = text.split(" ")
redacted = redacted.split(" ")
catched = false

words.each do |word|
  redacted.each do |redacted_word|
    if word == redacted_word
        catched = true
        print "REDACTED "
        break
    end
  end
    if catched == true
        catched = false
    else
        print word + " "
    end
end

Is there any proper/efficient way?

pangpang · Accepted Answer · 2015-05-05 12:26:09Z

20

It also can works.

words - redacted

+, -, &, these methods are very simple and useful.

irb(main):016:0> words = ["a", "b", "a", "c"]
=> ["a", "b", "a", "c"]
irb(main):017:0> redacted = ["a", "b"]
=> ["a", "b"]
irb(main):018:0> words - redacted
=> ["c"]
irb(main):019:0> words + redacted
=> ["a", "b", "a", "c", "a", "b"]
irb(main):020:0> words & redacted
=> ["a", "b"]

edited May 5, 2015 at 12:26

answered May 5, 2015 at 12:12

pangpang

8,86915 gold badges67 silver badges99 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Mark Thomas Over a year ago

The only problem is that this isn't very flexible. If you needed to make it case-insensitive for example, you'd have to switch to one of the other solutions.

souslov · Accepted Answer · 2015-05-05 15:41:18Z

16

You can use .reject to exclude all banned words that are present in the redacted array:

words.reject {|w| redacted.include? w}

Demo

If you want to get the list of banned words that are present in the words array, you can use .select:

words.select {|w| redacted.include? w}

Demo

edited May 5, 2015 at 15:41

answered May 5, 2015 at 11:50

souslov

44.7k11 gold badges92 silver badges113 bronze badges

1 Comment

mirageglobe Over a year ago

btw.. to side track. anyway to remove only the first occurrence?

das-g · Accepted Answer · 2015-05-05 14:39:03Z

This might be a bit more 'elegant'. Whether it's more or less efficient than your solution, I don't know.

puts "Give input text:"
original_text = gets.chomp
puts "Give redacted word:"
redacted = gets.chomp

redacted_words = redacted.split

print(
  redacted_words.inject(original_text) do |text, redacted_word|
    text.gsub(/\b#{redacted_word}\b/, 'REDACTED')
  end
)

So what's going on here?

I'm using String#split without an argument, because ' ' is the default, anyway.
With Array#inject, the following block (staring at do and ending at end is executed for each element in the array—in this case, our list of forbidden words.
- In each round, the second argument to the block will be the respective element from the array
- The first argument to the block will be the block's return value from the previous round. For the first round, the argument to the inject function (in our case original_text) will be used.
- The block's return value from the last round will be used as return value of the inject function.
In the block, I replace all occurrences of the currently handled redacted word in the text.
- String#gsub performs a global substitution
- As the pattern to be substituted, I use a regexp literal (/.../). Except, it's not really a literal as I'm performing a string substitution (#{...}) on it to get the currently handled redacted word into it.
- In the regexp, I'm surrounding the word to be redacted with \b word boundary matchers. They match the boundary between alphanumeric and non-alphanumeric characters (or vice verca), without matching any of the characters themselves. (They match the zero-lenght 'position' between the characters.) If a string starts or ends with alphanumeric characters, \b will also match the start or end of the string, respectively, so that we can use it to match whole words.
The result of inject (which is the result of the last execution of the block, i.e., the text when all the substitutions have taken place) is passed as an argument to print, which will output the now redacted text.

Note that, other than your solution, mine will not consider punctuation as parts of adjacent words.

Also note that my solution will be vulnerable to regex injection.

Example 1:

Give input text:
A fnord is a fnord.
Give redacted word:
ford fnord foo

My output:

A REDACTED is a REDACTED.

Your output:

A REDACTED is a fnord.

Example 2:

Give input text:
A fnord is a fnord.
Give redacted word:
fnord.

My output:

A REDACTEDis a fnord.

(Note how the . was interpreted to match any character.)

Your output:

A fnord is a REDACTED.

Collectives™ on Stack Overflow

Remove array elements that are present in another array

3 Answers 3

1 Comment

1 Comment

Example 1:

Example 2:

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

Example 1:

Example 2:

Comments

Your Answer

Sign up or log in

Post as a guest

Related