0

I want to match "https://www.mysite/embed/M7znk1c-ay0" only if it is not html comment.

So dont't match this line

<!--<p><iframe src="https://www.mysite/embed/M7znk1c-ay0" width="854" height="480" frameborder="0" allowfullscreen="allowfullscreen"></iframe>-->

but match this line

<article class="art-post"><div class="art-postcontent clearfix"><div class="art-article"><p><iframe  src="https://www.mysite/embed/M7znk1c-ay0" ></iframe></p>

I tried this pattern ^(?=<!--).*www.mysite\/embed\/+[\w\-]*

but it isn't quite working

1
  • Possible duplicate of C# parse html with xpath. There is no need for regex in your situation. Commented Sep 9, 2019 at 18:53

2 Answers 2

1

You almost did it correctly. The correct regex is ^(?!<!--).*"(.*www.mysite\/embed\/+[\w\-]*).

Sign up to request clarification or add additional context in comments.

Comments

0

HTML is not regular so using regular expressions to parse html might not be a good idea... @csabinho's answer ^(?!<!--).*"(.*www.mysite\/embed\/+[\w\-]*) won't work if the URL you want to match is in middle of a page, it simply checks if line doesn't begin with a comment.

Best practice would be to create DOM and use XPath to query XML-like contents.

Edit:

By the way you can use following code first to remove comments.

System.Text.RegularExpressions;
...
string pattern = @"(<!--(.+?)-->)";
var res = Regex.Replace(input, pattern, "", RegexOptions.Singleline);

and then use a simple pattern to extract the URL from result

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.