7

I am using c#. I have following string

<li> 
    <a href="abc">P1</a> 
    <ul>
        <li><a href = "bcd">P11</a></li>
        <li><a href = "bcd">P12</a></li>
        <li><a href = "bcd">P13</a></li>
        <li><a href = "bcd">P14</a></li>
    </ul>
</li>
<li> 
    <a href="abc">P2</a> 
    <ul>
        <li><a href = "bcd">P21</a></li>
        <li><a href = "bcd">P22</a></li>
        <li><a href = "bcd">P23</a></li>
    </ul>
</li>
<li> 
    <a href="abc">P3</a> 
    <ul>
        <li><a href = "bcd">P31</a></li>
        <li><a href = "bcd">P32</a></li>
        <li><a href = "bcd">P33</a></li>
        <li><a href = "bcd">P34</a></li>
    </ul>
</li>
<li> 
    <a href="abc">P4</a> 
    <ul>
        <li><a href = "bcd">P41</a></li>
        <li><a href = "bcd">P42</a></li>
    </ul>
</li>

My aim is to fill the following list from the above string.

List<class1>

class1 has two properties,

string parent;
List<string> children;

It should fill P1 in parent and P11,P12,P13,P14 in children, and make a list of them.

Any suggestion will be helpful.

Edit

Sample

public List<class1> getElements()
{
    List<class1> temp = new List<class1>();
    foreach(// <a> element in string)
    {
        //in the recursive loop
        List<string> str = new List<string>();
        str.add("P11");
        str.add("P12");
        str.add("P13");
        str.add("P14");

        class1 obj = new class1("P1",str);
        temp.add(obj);
    }
    return temp;
}

the values are hard coded here, but it would be dynamic.

10
  • 8
    That's work for Html Agility Pack. Commented Nov 30, 2012 at 13:20
  • Sorry I cant use and download any thirdparty tool or dll Commented Nov 30, 2012 at 13:21
  • @user1866361 Then you have the WebBrowser control Commented Nov 30, 2012 at 13:22
  • @lazyberezovsky : If we are searching in string having <li> inside <li>, then it would be recursive Commented Nov 30, 2012 at 13:23
  • You can download Html Agility Pack from nuget, it will add a dll to your project as a reference, there is plenty of documentation out there Commented Nov 30, 2012 at 13:24

3 Answers 3

4

What you want is a recursive descent parser. All the other suggestions of using libraries are basically suggesting that you use a recursive descent parser for HTML or XML that has been written by others.

The basic structure of a recursive descent parser is to do a linear search of a list of tokens (in your case a string) and upon encountering a token that delimits a sub entity call the parser again to process the sublist of tokens (substring).

You can Google for the term "recursive descent parser" and find plenty of useful result. Even the Wikipedia article is fairly good in this case and includes an example of a recursive descent parser in C.

Sign up to request clarification or add additional context in comments.

Comments

3

If you can't use a third party tool like my recommended Html Agility Pack you could use the Webbrowser class and the HtmlDocument class to parse the HTML:

WebBrowser wbc = new WebBrowser();
wbc.DocumentText = "foo"; // necessary to create the document
HtmlDocument doc = wbc.Document.OpenNew(true);
doc.Write((string)html); // insert your html-string here
List<class1> elements = wbc.Document.GetElementsByTagName("li").Cast<HtmlElement>()
    .Where(li => li.Children.Count == 2)
    .Select(outerLi => new class1
    {
        parent = outerLi.FirstChild.InnerText,
        children = outerLi.Children.Cast<HtmlElement>()
            .Last().Children.Cast<HtmlElement>()
            .Select(innerLi => innerLi.FirstChild.InnerText).ToList()
    }).ToList();

Here's the result in the debugger window:

enter image description here

2 Comments

I tried your code, but unfortunately it is showing following error ActiveX control '8856f961-340a-11d0-a96b-00c04fd705a2' cannot be instantiated because the current thread is not in a single-threaded apartment.
@user1866361: Is this a console application? It seems that this will solve the problem: stackoverflow.com/q/5519294/284240 Webbrowser is a Winforms class.
1

You can also use XmlDocument:

XmlDocument doc = new XmlDocument();
doc.LoadXml(yourInputString);
XmlNodeList colNodes = xmlSource.SelectNodes("li");
foreach (XmlNode node in colNodes)
{
    // ... your logic here
    // for example
    // string parentName = node.SelectSingleNode("a").InnerText;
    // string parentHref = node.SelectSingleNode("a").Attribures["href"].Value;
    // XmlNodeList children = 
    //       node.SelectSingleNode("ul").SelectNodes("li");
    // foreach (XmlNode child in children)
    // {
    //         ......
    // }
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.