-5

is it possible to remove all white spaces in the following HTML string in C#:

"
<html>

<body>

</body>

</html>
"

Thanks

5
  • 11
    Here: <html><body></body></html> Commented Aug 24, 2012 at 10:47
  • 1
    To the downvoters, provide an answer along with your downvote. Commented Aug 24, 2012 at 10:48
  • 6
    @OscarRyz - There is no obligation to do so. However, the question shows no research or effort. I am assuming this is why it is getting downvoted. The question is also not a realistic representation of an actual programming issue - it doesn't have any context. Commented Aug 24, 2012 at 10:49
  • 2
    When dealing with HTML or any markup for that matter rather just hacking what you think is just a string, it's best to run it through a parser that understands it. You can use HtmlAgilityPack to parse it...and rewrite it out properly. htmlagilitypack.codeplex.com ... or use HTML Tidy....stackoverflow.com/questions/2593147/… ... tidy.sourceforge.net ... or something like it. Commented Aug 24, 2012 at 10:50
  • 1
    Do you want to remove white-spaces or empty lines? Commented Aug 24, 2012 at 10:58

4 Answers 4

5

When dealing with HTML or any markup for that matter, it's usually best to run it through a parser that truly understands the rules of that markup.

The first benefit is that it can tell you if your initial input data is garbage to start with.

If the parser is smart enough it might even be able to correct badly formed markup automatically, or accept it with relaxed rules.

You can then modify the parsed content....and get the parser to write out the changes...this way you can be sure the markup rules are followed and you have correct output.

For some simple HTML markup scenarios or for markup that is so badly formed a parser just balks on it straight away, then yes you can revert to hacking the input string...with string replacements, etc....it all depends on your needs as to which approach you take.

Here are a couple of tools that can help you out:

HTML Tidy

You can use HTML Tidy and just specify some options/rules on how you want your HTML to be tidied up (e.g. remove superfluous whitespace).

It's a WIN32 DLL...but there are C# Wrappers for it.

HtmlAgilityPack

You can use HtmlAgilityPack to parse HTML if you need to understand the structure better and perhaps do your own tidying up/restructuring.

Sign up to request clarification or add additional context in comments.

Comments

3
myString = myString.Replace(System.Environment.NewLine, "");

Comments

1

You can use a regular expression to match white space characters for the replace:

s = RegEx.Replace(s, @"\s+", String.Empty);

4 Comments

@Eric: Yes, it works. It removes white space in the string. Space characters are also white space. Did you downvote my answer because you invented another requirement for the question?
I guess I made the assumption that the result should also be valid HTML.
@Eric: And the assumption that the string would contain something completely different from the example in the question...
You made that assumption too. If you assume the HTML is as shown, then Odad's comment is the only sensible answer.
-1

I used this solution (in my opinion it works well. See also test code):

  1. Add an extension method to trim the HTML string:
public static string RemoveSuperfluousWhitespaces(this string input)
{
    if (input.Length < 3) return input;
    var resultString = new StringBuilder(); // Using StringBuilder is much faster than using regular expressions here!
    var inputChars = input.ToCharArray();
    var index1 = 0;
    var index2 = 1;
    var index3 = 2;
    // Remove superfluous white spaces from the html stream by the following replacements:
    //  '<no whitespace>' '>' '<whitespace>' ==> '<no whitespace>' '>'
    //  '<whitespace>' '<' '<no whitespace>' ==> '<' '<no whitespace>'
    while (index3 < inputChars.Length)
    {
        var char1 = inputChars[index1];
        var char2 = inputChars[index2];
        var char3 = inputChars[index3];
        if (!Char.IsWhiteSpace(char1) && char2 == '>' && Char.IsWhiteSpace(char3))
        {
            // drop whitespace character in char3
            index3++;
        }
        else if (Char.IsWhiteSpace(char1) && char2 == '<' && !Char.IsWhiteSpace(char3))
        {
            // drop whitespace character in char1
            index1 = index2;
            index2 = index3;
            index3++;
        }
        else
        {
            resultString.Append(char1);
            index1 = index2;
            index2 = index3;
            index3++;
        }
    }

    // (index3 >= inputChars.Length)
    resultString.Append(inputChars[index1]);
    resultString.Append(inputChars[index2]);
    var str = resultString.ToString();
    return str;
}

// 2) add test code:

[Test]
public void TestRemoveSuperfluousWhitespaces()
{
    var html1 = "<td class=\"keycolumn\"><p class=\"mandatory\">Some recipe parameter name</p></td>";
    var html2 = $"<td class=\"keycolumn\">{Environment.NewLine}<p class=\"mandatory\">Some recipe parameter name</p>{Environment.NewLine}</td>";
    var html3 = $"<td class=\"keycolumn\">{Environment.NewLine} <p class=\"mandatory\">Some recipe parameter name</p> {Environment.NewLine}</td>";
    var html4 = " <td class=\"keycolumn\"><p class=\"mandatory\">Some recipe parameter name</p></td>";
    var html5 = "<td class=\"keycolumn\"><p class=\"mandatory\">Some recipe parameter name</p></td> ";
    var compactedHtml1 = html1.RemoveSuperfluousWhitespaces();
    compactedHtml1.Should().BeEquivalentTo(html1);
    var compactedHtml2 = html2.RemoveSuperfluousWhitespaces();
    compactedHtml2.Should().BeEquivalentTo(html1);
    var compactedHtml3 = html3.RemoveSuperfluousWhitespaces();
    compactedHtml3.Should().BeEquivalentTo(html1);
    var compactedHtml4 = html4.RemoveSuperfluousWhitespaces();
    compactedHtml4.Should().BeEquivalentTo(html1);
    var compactedHtml5 = html5.RemoveSuperfluousWhitespaces();
    compactedHtml5.Should().BeEquivalentTo(html1);
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.