0

I have the following line of HTML code and I used google chrome for xpath.

<DIV id=TasheelPaymentCtrl1_dvPayment>
<TABLE border=1 cellSpacing=0 borderColor=black cellPadding=7 width=625 align=center>
<TBODY>
<TR>
<TD class=ReceiptHeadArbCenterHead1 width=320>المسمى </TD>
<TD class=ReceiptHeadArbCenterHead1 width=75>دفع إلى</TD>
<TD class=ReceiptHeadArbCenterHead1 width=75>القيمة</TD>
<TD class=ReceiptHeadArbCenterHead1 width=75>الكمية</TD>
<TD class=ReceiptHeadArbCenterHead1 width=75>المجموع</TD></TR>
<TR>
<TD class=ReceiptHeadArbCenterHead>رسوم وزارة العمل</TD>
<TD class=ReceiptValueArbCenter>MOFI</TD>
<TD class=ReceiptValueArbCenter>3</TD>
<TD class=ReceiptValueArbCenter>1</TD>
<TD class=ReceiptValueArbCenter>3</TD>
<TR>
<TD class=ReceiptHeadArbCenterHead>رسوم الدرهم الإلكتروني</TD>
<TD class=ReceiptValueArbCenter>MOFI</TD>
<TD class=ReceiptValueArbCenter>3</TD>
<TD class=ReceiptValueArbCenter>1</TD>
<TD class=ReceiptValueArbCenter>3</TD>
<TR>
<TD class=ReceiptHeadArbCenterHead>رسوم مراكز الخدمة </TD>
<TD class=ReceiptValueArbCenter>MOFI</TD>
<TD class=ReceiptValueArbCenter>47</TD>
<TD class=ReceiptValueArbCenter>1</TD>
<TD class=ReceiptValueArbCenter>47</TD>
<TR>
<TD class=ReceiptHeadArbCenterHead1 colSpan=4>المجموع</TD>
<TD class=ReceiptValueArbCenter>53</TD></TR></TBODY></TABLE></DIV>

I want to extract values 3, 3, 47 and 53

I tried using this xpath

 var gf = doc.DocumentNode.SelectNodes("//div[@id='TasheelPaymentCtrl1_dvPayment']/table/tbody/tr[2]/td[5]");

                foreach (var node in gf)
                {


                    Console.WriteLine(node.InnerText); //output: "3"
                }

                var sf = doc.DocumentNode.SelectNodes("//div[@id='TasheelPaymentCtrl1_dvPayment']/table/tbody/tr[3]/td[5]");

                foreach (var node in sf)
                {


                    Console.WriteLine(node.InnerText); //output: "3"
                }
                var tf = doc.DocumentNode.SelectNodes("//div[@id='TasheelPaymentCtrl1_dvPayment']/table/tbody/tr[4]/td[5]");

                foreach (var node in tf)
                {


                    Console.WriteLine(node.InnerText); //output: "47"
                }
var Allf = doc.DocumentNode.SelectNodes("//div[@id='TasheelPaymentCtrl1_dvPayment']/table/tbody/tr[5]/td[2]");

                foreach (var node in Allf )
                {


                    Console.WriteLine(node.InnerText); //output: "53"
                }

but i am getting null object exception.. I used Google chrome developer tools to copy the xpath. I am getting null point exception . How can extract value .. My question is why I am getting null point reference exception, is there any mistake in xpath value? Please help me.

7
  • Where are you getting the NullReferenceException? Commented Mar 3, 2016 at 7:20
  • Possible duplicate of What is a NullReferenceException and how do I fix it? Commented Mar 3, 2016 at 7:20
  • @DanielHilgarth , I am getting null point reference exception in for each loop "gf" Commented Mar 3, 2016 at 7:24
  • So, it looks like the XPath is not working. Are you sure that the indexes are one-based? I would think that this XPath looks more correct with zero based indexes: //div[@id='TasheelPaymentCtrl1_dvPayment']/table/tbody/tr[1]/td[4] Commented Mar 3, 2016 at 7:27
  • @DanielHilgarth please explain what is one-based and zero-based. i am not sure about that Commented Mar 3, 2016 at 7:34

1 Answer 1

1

As you have discovered, some of your XPath expressions don't work because the <tr> tags are not all closed.

Therefore, you will need to cater for this in your XPath expressions:

  • //div[@id='TasheelPaymentCtrl1_dvPayment']/table/tbody/tr[2]/td[5] - no change
  • //div[@id='TasheelPaymentCtrl1_dvPayment']/table/tbody/tr[3]/td[5] - should be //div[@id='TasheelPaymentCtrl1_dvPayment']/table/tbody/tr[2]/tr/td[5]
  • //div[@id='TasheelPaymentCtrl1_dvPayment']/table/tbody/tr[4]/td[5] - should be //div[@id='TasheelPaymentCtrl1_dvPayment']/table/tbody/tr[2]/tr/tr/td[5]
  • //div[@id='TasheelPaymentCtrl1_dvPayment']/table/tbody/tr[5]/td[2] - should be //div[@id='TasheelPaymentCtrl1_dvPayment']/table/tbody/tr[2]/tr/tr/tr/td[2]
Sign up to request clarification or add additional context in comments.

1 Comment

Good, but i came up with this option. clearing html documents. HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); text = Regex.Replace(text, "<tr[^>]*>(?:(?!</?tr>|</tbody>|</table>).)*?(?=<tr[^>]*>|</tbody>|</table>)", "$&</tr>", RegexOptions.Singleline | RegexOptions.IgnoreCase); doc.LoadHtml(text); doc.OptionAutoCloseOnEnd = true;

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.