4

I have a string, for example:

'This is a test string'

and an array:

['test', 'is']

I need to find out how many elements in array are present in string (in this case, it would be 2). What's the best/ruby-way of doing this? Also, I am doing this thousands of time, so please keep in mind efficiency.

What I tried so far:

array.each do |el|
 string.include? el #increment counter
end

Thanks

2
  • @SergioTulentsev I looped through the array and used include? method. Commented Oct 12, 2012 at 13:36
  • What do you consider a match? For example, do you count "is" to be matched by the word "This" or do you only count full word matches? Commented Oct 12, 2012 at 13:42

5 Answers 5

7
['test', 'is'].count{ |s| /\b#{s}\b/ =~ 'This is a test string' }

Edit: adjusted for full word matching.

Sign up to request clarification or add additional context in comments.

1 Comment

@0xSina you're welcome. Try this out.
3
['test', 'is'].count { |e| 'This is a test string'.split.include? e }

5 Comments

It's ['test', 'is'].count { |e| 'This is a test string'.include? e }, if u want to go down that road :)
Almost, he used regex to count the words.
That's the reason I find these algorithms fairly inefficient, regex more so than #include? variety, but it is of no consequence for small n.
The OP is trying to find full word occurrences and String#include? would not work for that. 'hello'.include?('hell') # => true
@megas Yes. I was really commenting on Boris' "regex more so than #include" comment.
2

Your question is ambiguous.

If you are counting the occurrences, then:

('This is a test string'.scan(/\w+/).map(&:downcase) & ['test', 'is']).length

If you are counting the tokens, then:

(['test', 'is'] & 'This is a test string'.scan(/\w+/).map(&:downcase)).length

You can further speed up the calculation by replacing Array#& by some operation using a Hash (or Set).

3 Comments

While your answer is extremely interesting, the question is whether it is sufficiently general. What would happen if some of the match strings match the same word (not the case now, but could be in general)?
@BorisStitnicky I think you are realizing the same amguity in the question as I did. See my edit.
Yeah, I never said it was your fault. But I must admit it, this question is an interesting refreshment from my boring programming task at hand today :)))
0

Kyle's answer gave you the simple practical way of doing the job. But looking at it, allow me to remark that more efficient algorithms exist to solve your problem, when n (string length and/or number of matched strings) climbs to millions. We commonly encounter such problems in biology.

Comments

0

Following will work provided there are no duplicates in string or array.

str = "This is a test string"
arr = ["test", "is"]

match_count = arr.size - (arr - str.split).size # 2 in this example

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.