How to parse a string with html table tags?

Question

I have a string:

string s= "<tr><td>abc</td><td>1</td><td>def</td></tr><tr><td>aaa</td><td>2</td><td>bbb</td></tr>";

Which looks - formatted like this:

<tr>
    <td>abc</td>
    <td>1</td>
    <td>def</td>
</tr>
<tr>
    <td>aaa</td>
    <td>2</td>
    <td>bbb</td>
</tr>

Now I want get values "1" and "2", how do I do this? I have tried convert it to XML but not success.

A valid XML document must have a single root node. Wrap your string in one before converting. — Micha Wiedenmann
– Micha Wiedenmann, Commented Jun 15, 2017 at 7:29
because in that string have some symbol <tr><td>1</td><td align='center'><i class='cls'></i></td><td><a href='test.aspx?id=1&ct=0&lt=2'style='color:#4169E1'>abc</a></td><td>1</td><td><span style='display:none;'>xxxx</span>xxxx</td><td>def</td></tr> — Brom
– Brom, Commented Jun 15, 2017 at 7:31

Jaimin Dave · Accepted Answer · 2017-06-15 07:33:36Z

2

You can use HTML Agility Pack. to achieve this

HtmlDocument doc = new HtmlDocument();
doc.Parse(str);

IEnumerable<string> cells = doc.DocumentNode.Descendants("td").Select(td => td.InnerText);

answered Jun 15, 2017 at 7:33

Jaimin Dave

1,22210 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Brom Over a year ago

i'm using .net framework 2.0 and maybe it is not support this

Tien Nguyen Ngoc · Accepted Answer · 2017-06-15 07:40:18Z

1

string s = "<tr><td>abc</td><td>1</td><td>def</td></tr><tr><td>aaa</td><td>2</td><td>bbb</td></tr>";
s = s.Replace("<tr>","").Replace("</tr>","").Replace("</td>","");
string[] val = s.Split(new string[] { "<td>" }, StringSplitOptions.None);

string one = val[2];
string two = val[5];

I hope it will work for you.

answered Jun 15, 2017 at 7:40

Tien Nguyen Ngoc

1,5551 gold badge9 silver badges17 bronze badges

Comments

Fruchtzwerg · Accepted Answer · 2017-06-15 07:56:30Z

0

Regex regex = new Regex("<td>(.*?)<\\/td>");
var maches = regex.Matches("<tr><td>abc</td><td>1</td><td>def</td></tr><tr><td>aaa</td><td>2</td><td>bbb</td></tr>");
var values = maches.Cast<Match>().Select(m => m.Groups[1].Value).ToList();

edited Jun 15, 2017 at 7:56

Fruchtzwerg

11.4k12 gold badges44 silver badges57 bronze badges

answered Jun 15, 2017 at 7:53

Daniel Tshuva

5035 silver badges12 bronze badges

Comments

user5014677 · Accepted Answer · 2017-06-15 07:58:25Z

0

            string s = "<tr><td>abc</td><td>1</td><td>def</td></tr><tr><td>aaa</td><td>2</td><td>bbb</td></tr>";

            var regexPunctuation = s;
            while (regexPunctuation != "")
            {
                regexPunctuation = System.Text.RegularExpressions.Regex.Match(s, @"\d+").Value;
                s = s.Substring(s.IndexOf(regexPunctuation)+regexPunctuation.Length);
                MessageBox.Show(regexPunctuation);
            }

The regex matches every number in the string and the while loop goes through all of them. Do what ever you want intead of MessageBox.Show and you're good to go.

answered Jun 15, 2017 at 7:58

user5014677

6946 silver badges24 bronze badges

Comments

Johann Nel · Accepted Answer · 2017-06-15 08:28:42Z

0

Good day Brom

This might not be the solution you were looking for but it will definitely provide one of the many help.

I would use this regex to extract all the tags

(<\/[a-z]*>)+(<[a-z]*>)+|(<[a-z]*>)+(<\/[a-z]*>)+|(<[a-z]*>)+|(<\/[a-z]*>)+

Example:

  string input = "<tr><td>abc</td><td>1</td><td>def</td></tr><tr><td>aaa</td><td>2</td><td>bbb</td></tr>";
  string replacement = "#";

  string pattern = "(<\/[a-z]*>)+(<[a-z]*>)+|(<[a-z]*>)+(<\/[a-z]*>)+|(<[a-z]*>)+|(<\/[a-z]*>)+";

  RegexOptions options = RegexOptions.IgnoreCase | RegexOptions.Compiled | 
  RegexOptions.Multiline;

  Regex rgx = new Regex(pattern, options);

  string result = rgx.Replace(input, replacement);
  // result == "#abc#1#def#aaa#2#bbb#"

This regex expression will grab the tags as groups or as individuals and then you could replace it with a delimiter line a pipe "|" or "#" and split on that. I hope this helps.

Kind Regards

Ps. Regex explanation: Pipes are used as or operators

(<\/[a-z]*>)+(<[a-z]*>)+ // Closing tag(s) that are followed by opening tag(s)
(<[a-z]*>)+(<\/[a-z]*>)+ // Opening tags followed by closing tags
(<[a-z]*>)+ // one or more opening tags
(<\/[a-z]*>)+ // one or more closing tags

answered Jun 15, 2017 at 8:28

Johann Nel

1765 bronze badges

1 Comment

Johann Nel Over a year ago

Also just to mention this regex will work on any and all html/xml elements, not completely sure what the outcome will be with self closing tags.

Collectives™ on Stack Overflow

How to parse a string with html table tags?

5 Answers 5

1 Comment

Comments

Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

Comments

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related