-3

I'm trying to extract and replace or remove sections of a customised HTML document.

<div>
    stuff I want to keep
</div>
<i class="special_i">@if includesDownloadables</i>
    This is some stuff I want to replace on a certain condition.
<i class="special_i">@endif includesDownloadables</i>
<div>
    More stuff to keep
</div>
<i class="special_i">@if instantDownload</i>
    More stuff that is conditional.
<i class="special_i">@endif instantDownload</i>
<div>
    More stuff to keep
</div>

I have been messing on regex101.com and experimenting myself, with this:

string condition = "includesDownloadables";
string pattern = @"<i\s.*?(@if {key}).*?(@endif {key}<\/i>)"; 
Regex regex = new Regex(pattern.Replace("{key}", condition ), RegexOptions.Multiline | RegexOptions.Singleline | RegexOptions.IgnoreCase);
string MyString = "All the html to read";
MyString = regex.Replace(MyString , "");

The end point of my Regex seems to be working, but it is matching from the first instance of any tag, not specifically the one containing the condition I am looking for.

NOTE; this must also be able to take into account the tag have any attributes.

So how can I make it match from the start of an tag which contains the text @if MyCondition where the tag can have any data, style or class attributes and only if the tag contains the text I am looking for?

4
  • 1
    Well, if you playing with HTML, better use How to Use HTML Agility Pack in C# Commented Jun 30, 2023 at 14:29
  • 1
    You can try <i\b(?:(?!<i\b).)*?(@if {key}\b).*?(@endif {key}<\/i>), with s flag (or replace all . with [\s\S]). Or better yet, reconsider using specified parser for you syntax. Commented Jun 30, 2023 at 14:30
  • So, it's basically HTML templating? => stackoverflow.com/a/65596807/982149 Commented Jun 30, 2023 at 14:55
  • Thank you @markalex that solution works. I wasn't after an alternative way of doing things. I'm aware of the HTML Agility pack and at some point will upgrade my project to use that, but for now I am where I am and just needed that solution! Commented Jun 30, 2023 at 15:07

1 Answer 1

1

You can use following regex: <i\b(?:(?!<i\b).)*?(@if {key}\b).*?(@endif {key}<\/i>).

I modified you regex in the following way:

  • changed <i\s to <i\b in case of tag without any atributes (<i>),
  • (?:(?!<i\b).)*? instead of .*? prevents <i appearing between matched <i and if tagline. Please notice, that lazy quantifier works only on the ending of match, and to prevent matching strings like <i>smth</i>la-la<i>@if {key} you need to use this construction.
  • added \b to the end of @if {key}, to prevent partial matching of keys.
string condition = "includesDownloadables";
string pattern = @"<i\b(?:(?!<i\b).)*?(@if {key}\b).*?(@endif {key}<\/i>)"; 
Regex regex = new Regex(pattern.Replace("{key}", Regex.Escape(condition)), RegexOptions.Multiline | RegexOptions.Singleline | RegexOptions.IgnoreCase);
string MyString = "All the html to read";
MyString = regex.Replace(MyString , "");

Demo of regex here.


Mandatory notice

If you are using some widely used templating solution, consider using respective parser.

It might be not as simple as suggested HTML Agility pack, as templating works over HTML, and template itself might contain incorrect HTML syntax.

But appropriate parsers are better fit for the task, as they are "aware" of syntax, and could correctly handle nesting, for example.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.