0

I need to split a string that consists of html elements.

I want to split between two chars "<" and ">".

var htmlElements = "<p>lorem ipsum</p><span>nisi sapien</span><ul><li>list items</li></ul>";
string[] arrayOfElements = htmlElements.Split('<', '>')[1];

Using this code only pulls out the first "p". I need to pull out every element to a string array. The closing tag </p> doesn't matter, i need only the starting tag for every element.

Desired output is a string array containing p span ul li

4
  • 1
    What is the desired output? Commented Oct 25, 2016 at 9:22
  • 2
    use HTMLAgilityPack. Commented Oct 25, 2016 at 9:23
  • See updated main post for desired output. I would like to solve it by NOT using a third party library Commented Oct 25, 2016 at 9:25
  • jquery can easily do that? Commented Oct 25, 2016 at 9:26

1 Answer 1

3

I suggest using regular expressions in order to extract (match) the required values:

string htmlElements = "<p>lorem ipsum</p><span>nisi sapien</span><ul><li>list items</li></ul>";

string[] arrayOfElements = Regex
  .Matches(htmlElements, @"<(\w+)>")
  .OfType<Match>()
  .Select(m => m.Groups[1].Value)
  .ToArray();

Test

// p span ul li
Console.Write(string.Join(" ", arrayOfElements));

In general case, parsing html by means of regular expressions is a bad idea, but if you want just to obtain items' values it can be good enough.

Sign up to request clarification or add additional context in comments.

1 Comment

It worked just as expected. I don't know why someone downvoted it?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.