2

I am working on a program that scrapes emails in Ruby, therefore simply using a regex to try to utilize .match(/some regex/) can only be part of the solution. There is no perfect regex for this problem in any language.

Either the expression accepts too many strings, resulting in false-positive matches, or valid results are excluded. I am using a regex for email "validation" (actually email "suspicion" is a more apt term) that casts a "wide net".

This strategy allows me to maximize positive results by storing the suspected addresses in an array and iterating through to deal with edge cases. This question revolves around one particular edge case.

Take for example the string:

desktop_variety_top@728x90

The logic to handle strings like this example would be to purge any string that contains no periods between the @ and then end of the string.

So we might be looking at something like:

def purge_edge_case(array)
  array.reject! { |s| s.<first_condition>? && s.<second_condition>? }
end

Figuring out the two string-based conditions is where I'm currently stuck.

8
  • Possible duplicate of What is the best/easy way to validate an email address in Ruby? Commented May 6, 2017 at 16:56
  • I don't think so. There are many regular expressions to match email addresses written in all the major programming languages. The problem is that none of them is perfect. So the "net" in invariable cast either two wide or too narrow. The optimal solution, in scrapping applications (which is what I am working on), is to cast the net wide and then whittle down the list through a series of steps. This question represents one such step. Commented May 6, 2017 at 17:20
  • I'm a bit lost. What is a 'conditional regex statement'? Second, why are you showing 2 conditions for testing for periods? Lastly, there are no all seeing solutions, as you mentioned, so what makes you think you are going to create one? Commented May 6, 2017 at 17:23
  • I think you need one regex - /@[^@.]+\z/ matching any string that has no dot in between the last @ and the end of the string. Commented May 6, 2017 at 17:29
  • 1
    Ruby does support conditional regex statements: rubular.com/r/qyxnL8RQpQ Commented May 6, 2017 at 17:37

2 Answers 2

2

There is no need for regex here:

test = input.split('@')
test.size == 2 && \
   && !test.last.starts_with?('.') \
   && !test.last.ends_with?('.') \ 
   && test.last.includes?('.')

Or, less strict, exactly as you requested:

test.size == 2 && test.last[/\./] # at least one dot after `@`
Sign up to request clarification or add additional context in comments.

Comments

0

Here is the completed method that solves the problem:

def purge_edge_case(array)
    array.reject! { |s| s.match(/@.*/).to_s != nil && s.match(/@.*/).to_s.match(/\./) == nil }
end

7 Comments

How on the Earth that could have been upvoted? to_s != nil is a nonsense, the whole answer is a perfect example of code smell and bad practice. Flagged for mod attention.
@mudasobwa: Just curious : why mod attention?
@mudasobwa I am sure that there are cleaner ways to write the code. However to declare it as "nonsense" is nonsense! The code is in fact valid Ruby, and not only does it run (without error I might add), but it also solves the issue I raised in my question.
@EricDuminil the question “How would I use regexp to detect an email” and a clumsy answer from the OP receive 2 and 1 upvotes respectively.
@mudasobwa I up voted the question to give more visibility to your answer.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.