I'm new to stackoverflow and from South Korea.
I'm having difficulties with regex with php.
I want to select all the urls from user submitted html source.
The restrictions I want to make are following.
Select urls EXCEPT
urls are within tags for example if the html source is like below,
<a href="http://aaa.com">http://aaa.com</a>Neither of
http://aaa.comshould be selected.urls right after " or =
Here is my current regex stage.
/(?<![\"=])https?\:\/\/[^\"\s<>]+/i
but with this regex, I can't achieve the first rule.
I tried to add negative lookahead at the end of my current regex like
/(?<![\"=])https?\:\/\/[^<>\"\s]+(?!<\/a>)/i
It still chooses the second url in the a tag like below.
http://aaa.co
We don't have developers Q&A community like Stackoverflow in Korea, so I really hope someone can help this simplely looking regex issue!
DOMDocument$links = $dom->getElementsByTagName('a');gives you all the link elements. Then simply loop over them, and get the links by doing$link->getAttribute('href')->value;. If certain url's should be skipped, then that's where a regex fits in. To get the link text:$link->nodeValueshould worktextContentproperty of an instance ofDOMNode, or you can simply strip away the markup tags of your HTML, by callingstrip_tags