I have this RegEx expression to match http:// links-like part of text:
([A-Za-z]{3,9}):\/\/([-;:&=\+\$,\w]+@{1})?([-A-Za-z0-9\.]+)+:?(\d+)?(\/[-\+~%\/\.\w]+)?\??([-\+=&;%@\.\w]+)?#?([\w]+)?
and later convert them to hyperlinks with some code. It really works good.
However, http:// part of text can be found in < img > tag too:
<img src="http://www.nature.com/images/home_03/main_news_pic2013.02.19.jpg" alt="Pulpit rock" width="304" height="228">
So, I have to modify existing RegEx to NOT match http links-like part of text with quotation mark or apostrophe before. How to NOT match:
"http
I tried with [^"|']:
[^"|']([A-Za-z]{3,9}):\/\/ ..........
but it does not work.
(?!<["']). But a more reliable approach would be to parse the HTML and then process the text nodes only. After all, there might be all sorts of reasons why a URL is preceded by a quotation mark.http://some.domen.compart of text without < a > tag, because user typed it like that. My task is to search for those link-links parts of text and convert them to real hyperlinks (just adding < a > tag ). So, I cannot use DOM to locate them. Am I right?