0

Through jsoup I'm able to retrieve the Registration Date data but unable to get second column value 31/12/2009. Instead it returns empty string. I tried all possible ways.

All the table rows are extracting correctly.

<tr>
<td style="width: 30%; font-weight: bold; background-color: #d7e8ff; ">
  <span style="font-size: 10pt"> Registration Date</span></td>
<td style="margin-bottom: 1px; padding-bottom: 1px; background
   color:lemonchiffon;">
 <span id="ContentPlaceHolder1_636042629082042500">الخميس 31/12/2009</span>
 <span id="ContentPlaceHolder1_iInstalldate"></span></td>
</tr>

Here is the java code I'm using:

Element table = doc.select("TABLE").get(2);
Elements table1=table.select("table[border=1]"); // to select particular  
      //table
Elements rows=table1.select("tr");

for (int i = 0; i < rows.size(); i++) {
      Element row = rows.get(i);
      Elements cols=row.select("td");
      for (Element col : cols) {
         if (!(col.text().equals("")))                                 
            Log.e("test", col.text()+cols.size());
       }
}

Here is the output but only values in first column not the second one:

Registration Date , Account Type , Current Account Status , Total Account Credit , Used Credit , Valid Credit , Credit Expiry Date

Now Here is the Sample Source of this page table with following Rows

<tr>
  <td style="width: 30%; font-weight: bold; background-color: #d7e8ff; ">
  <span style="font-size: 10pt">Registration Date</span></td>

 <td style="margin-bottom: 1px; padding-bottom: 1px; background-color:  
     lemonchiffon;">

  <span id="ContentPlaceHolder1_636045303384071212">الخميس 31/12/2009</span>
  <span id="ContentPlaceHolder1_iInstalldate"></span></td>
 </tr>
 <tr>
 <td style="width: 30%; font-weight: bold; background-color: #d7e8ff; 
     ">Account Type</td>
 <td style="margin-bottom: 1px; padding-bottom: 1px; background-color: 
       lemonchiffon;">

<span id="ContentPlaceHolder1_636045303384071212">1 Mbps---فضي</span>
<span id="ContentPlaceHolder1_iAcctType"></span></td>
</tr>

Here is the code i am using to access the web page

loginForm=Jsoup.connect("http://adsl.yemen.net.ye/en/user_main.aspx")

.data("ctl00$ContentPlaceHolder1$loginframe$Password", "MAMAM")

.data("ctl00$ContentPlaceHolder1$loginframe$LoginButton", "Sign In")
.data("__LASTFOCUS", "")
.data("__EVENTTARGET", "")
.data("__EVENTARGUMENT","")
.userAgent("Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML,   
 like Gecko) Chrome/51.0.2704.103 Safari/537.36")
.cookies(loginForm.cookies())
.followRedirects(false)
.method(Connection.Method.POST)
.execute();
4
  • Can you please add your code? Commented Jul 17, 2016 at 11:29
  • I have posted code above its retrieving only values in the first column , here is a web page login page i am accessing: adsl.yemen.net.ye/en/login.aspx username: MASALAHI2010 password:MAMAM Then on the next page i have to retrieve all the table contents but the second column value is not displaying by the code Commented Jul 17, 2016 at 12:20
  • Maybe the username is missing from your code. Commented Jul 20, 2016 at 14:00
  • i have added the user name as well but the same issue that the whole html might not be extracted Commented Jul 21, 2016 at 1:26

2 Answers 2

0

You can do it in this way:

package com.github.davidepastore.stackoverflow38415236;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

/**
 * Stackoverflow 38415236 answer.
 *
 */
public class App {
    public static void main(String[] args) {
        String html = "<table><tr>\r\n"
            + "  <td style=\"width: 30%; font-weight: bold; background-color: #d7e8ff; \">\r\n"
            + "  <span style=\"font-size: 10pt\">Registration Date</span></td>\r\n"
            + "\r\n"
            + " <td style=\"margin-bottom: 1px; padding-bottom: 1px; background-color:  \r\n"
            + "     lemonchiffon;\">\r\n"
            + "\r\n"
            + "  <span id=\"ContentPlaceHolder1_636045303384071212\">الخميس 31/12/2009</span>\r\n"
            + "  <span id=\"ContentPlaceHolder1_iInstalldate\"></span></td>\r\n"
            + " </tr>\r\n"
            + " <tr>\r\n"
            + " <td style=\"width: 30%; font-weight: bold; background-color: #d7e8ff; \r\n"
            + "     \">Account Type</td>\r\n"
            + " <td style=\"margin-bottom: 1px; padding-bottom: 1px; background-color: \r\n"
            + "       lemonchiffon;\">\r\n"
            + "\r\n"
            + "<span id=\"ContentPlaceHolder1_636045303384071212\">1 Mbps---فضي</span>\r\n"
            + "<span id=\"ContentPlaceHolder1_iAcctType\"></span></td>\r\n"
            + "</tr></table>";
        Document doc = Jsoup.parse(html);
        Element table = doc.select("table").first();
        Elements trs = table.select("tr");
        for (Element tr : trs) {
            Elements td = tr.select("td");
            Element firstTd = td.first();
            Element secondTd = td.get(1);
            System.out.println(firstTd.text() + " --- " + secondTd.text());
        }
    }
}

The output is:

Registration Date --- الخميس 31/12/2009
Account Type --- 1 Mbps---فضي
Sign up to request clarification or add additional context in comments.

7 Comments

Sorry this code also giving the same output with second col values as blank Registration Date --- Account Type --- Total Account Credit --- Used Credit --- Valid Credit --- Credit Expiry Date ---
The Page source is posted abobe just now and Your mine code both is retriving value of second span with id ContentPlaceHolder1_iInstalldate which is empty value instead of the first span ContentPlaceHolder1_636045303384071212---- and here that numeric(636045303384071212) in the id of span changes each time i refresh the page that means that col id is generated randomly.. So how to get value of the table contents generated dynamically?
Even if i try to read the span id using jsoup it skip that span whose id is dynamically generated
Which version of Jsoup are you using? Check my updated answer.
you have hard coded that html source as string .. but in my case it works if i hard code but if i access it from the page and parse it into html it doesnt work . .n dont retrieve the value of that dynamic field..
|
0

Use selectors to target the elements directly and then extract their text.

    Document doc = Jsoup.parse(htmlContent);

    Elements rows = doc.select("table[border=1] tr");

    for (Element row : rows) {
        String key = row.select("td:first-child").text();
        String value = row.select("td:nth-child(2) span:first-child").text();
        System.out.println("key=" + key + " value=" + value);
    }

Output

key=Registration Date value=31/12/2009
key=Account Type value=1 Mbps---

4 Comments

sorry zack this will also work when i copy paste html source but not when i access the source directly
actually each time i access the page the id of that td column changes ..n i feel like that my code is not reading that td field in the html source
i tried to display the whole html i retrieved after parsing but it display html source only to few bytes even i set maxByte to zero as well ..if you could plz tell that how to display whole html so that i can cross check whether jsoup.parse is having that dynamic col field in it or not
Could it be that the html you are trying to parse is generated dynamically using javaScript and is not part of the source code delivered from the server? Try comparing the source (right click, view source) to the rendered content (right click, inspect) and see if they match. If not, then you need to render the HTML using something like HTML Unit. Here is an example: stackoverflow.com/a/38572859/1176178

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.