2

I'm using the following VB.net Code to find phone numbers in HTML and make them "clickable":

 Regex.Replace(pDisp.Document.Body.innerHTML, "([0-9+ ]{3,6}[\s]{1,1}[0123456789 \-/]{4,15})", "<a href=http://DIAL/$1>$1</a>")

There appears an issue if the numbers contain white spaces, for example:

089 12233 455

This will be replaced with:

<a href=http://DIAL/089 12233 455>089 12233 455</a>

Is there a way to get

<a href=http://DIAL/08912233455>089 12233 455</a>

instead?

Thank you very much!

3 Answers 3

2

Instead of <a href=http://DIAL/$1>$1</a>, use:

<a href=http://DIAL/$1>$0</a>

so that the output text is the whole capture, which will include the original formatting.

Sign up to request clarification or add additional context in comments.

3 Comments

I'm not suggesting you change the part within the href (because your OP looked like that was working). I'm only suggesting changing the text between the <a> and </a> tags.
Okay, thanks... Unfortunately this is not working... It would be more important to the change the href itself...
Hmm.. think I've misinterpretted your question - I think Ahmad's MatchEvaluator looks a better option.
1

You can reach a solution using the Regex.Replace overload that accepts a MatchEvaluator.

Example:

Dim pattern = "([0-9+ ]{3,6}[\s]{1,1}[0123456789 \-/]{4,15})"
Dim inputs As String() = { "089 12233 455", "0711 123 00 376", "0711 5600920", "0711 62009211", "0711 620092 11", "+49 711 123 00 376", "0049 711 5600920" }

For Each input In inputs
    Console.WriteLine(input)
    Dim result = Regex.Replace(input, pattern,
        Function(m) "<a href=http://DIAL/" & m.Value.Replace(" ", "") & ">" & m.Value & "</a>")
    Console.WriteLine("Result: {0}", result)
    Console.WriteLine()
Next

The lambda uses the Match result and we build the link while replacing spaces with empty strings, and keeping the original value unaltered for the link text. You could make it more readable using String.Format if the concatenation looks unreadable. If the href needs to strip the leading plus sign, you could chain another String.Replace or perform another regex replace on [+ ] to remove spaces and plus signs.

I also think you can shorten your original regex to "[0-9+ ]{3,6}\s[0-9 -]{4,15}". Compared to your original pattern, the [\s]{1,1} has been shortened, and the [0123456789 \-/] uses a 0-9 range as you've done earlier. As long as the dash is placed either at the beginning or end of the character class, it doesn't need to be escaped. Lastly, I removed the / since I saw no examples with a forward slash.

Comments

0

You could break your capture groups up. Then, in your replace, do something like this:

"<a href=http://DIAL/$1$2$3>$1 $2 $3</a>"

5 Comments

That would also make every match space-separated for the output (as long as that's acceptable).
I have already tried this. My problem is, that there can be many different formats: +49 89 123456 789 089 123 456 789 089 123456789
Will the # of digits always be the same? --edit - I guess not, since you gave examples with different numbers of digits :)
No... These are some real-world examples:
0711 123 00 376, 0711 5600920, 0711 62009211, 0711 620092 11, +49 711 123 00 376, 0049 711 5600920

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.