1

I am creating a regex library to work with HTML (I'll post it on MSDN Code when it's done). One of the methods removes any whitespace before a closing tag.

<p>See the dog run </p>

It would eliminate the space before the closing paragraph. I am using this:

    public static string RemoveWhiteSpaceBeforeClosingTag(string text)
    {
        string pattern = @"(\s+)(?:</)";
        return Regex.Replace(text, pattern, "</", Singleline | IgnoreCase);
    }

As you can see I am replacing the spaces with </ since I cannot seem to match just the space and exclude the closing tag. I know there's a way - I just haven't figured it out.

1
  • FYI, both the Singleline and IgnoreCase modifiers are irrelevant, as there are no dots or letters in the regex. Commented May 22, 2009 at 22:19

2 Answers 2

11
\s+(?=</)

is that expression you're after. It means one or more white-space characters followed by

That all being said, regular expressions are a flaky and error-prone way of processing HTML so should be used with caution if at all.

Sign up to request clarification or add additional context in comments.

1 Comment

That was it - thanks. I wish there was an alternative to processing the HTML I'm getting. You should have seen the IndexOf and LastIndexOf code that this is replacing 8-\
3

You want a lookahead (?=) pattern:

\s+(?=</)

That can be replaced with ""

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.