c# regex to replace on last occurence of pattern

Question

I built an extension to convert HTML formatted text to something better for a list view. It removes all HTML tags except it replaces <h> and <p>s with <br /> to keep readability on the list view. It also shortens the text for longer posts. I put it on my razor view with HTML.Raw(model.text).

public static string FixHTML(string input, int? strLen)
        {
            string s = input.Trim();
            s = Regex.Replace(s, "</p.*?>", "<br />");
            s = Regex.Replace(s, "</h.*?>", "<br />");
            s = s.Replace("<br />", "*ret$990^&");
            s = Regex.Replace(s, "<.*?>", String.Empty);
            s = Regex.Replace(s, "</.*", String.Empty);
            s = s.Replace("*ret$990^&", "<br />");
            int i = (strLen ?? s.Length);
            s = s.Substring(0,(i > s.Length ? s.Length : i));
            return(s);
        }

PROBLEM: if the last character gets cut off mid <br /> it messes up the displayed text. Example it gets cut off at blah blah blah <br then the display isnt nice. How can I use REGEX (or even string replace) to find only the last occurence of <b.... and only if it doesnt have a closing >.

I was thinking of something like:

s = string.Format(s.Substring(0, s.Length-6) + Regex.Replace(s.Substring(s.Length - 6), "<.*", string.Empty));

That will probably work but my whole converter seems like it is using a to of code to do something that should be relatively simple.

How can I do this?

Is there anything that IS recommended to "clean" HTML? What I am doing above works, but I agree its not pretty. — dave317
– dave317, Commented Jan 18, 2018 at 20:40
Possible duplicate of RegEx match open tags except XHTML self-contained tags — Lews Therin
– Lews Therin, Commented Jan 18, 2018 at 21:26
I would suggest a library such as HtmlAgilityPack to parse through and change your HTML — Mike Kuenzi
– Mike Kuenzi, Commented Jan 18, 2018 at 22:21

SBFrancies · Accepted Answer · 2018-01-18 22:15:38Z

2

Try this:

s = Regex.Replace(s, "(<|<b|<br|<br/)$", "", RegexOptions.None);

answered Jan 18, 2018 at 22:15

SBFrancies

4,2602 gold badges18 silver badges50 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Rudism Over a year ago

An alternate regex that would catch all incomplete html tags (not just br) at the end of a string would be "<[^>]*$".

SBFrancies Over a year ago

@Rudism - definitely a good solution, the only problem might be if the "<" character appeared in the text not as part of a tag

Collectives™ on Stack Overflow

c# regex to replace on last occurence of pattern

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related