I'm using Jsoup to extract text from a website, and I can't figure out how to properly get specific rows of data in nested tables. I need to get the plain text after the parts that say Property Address: and Mailing Address:, so I can store the data.
Here is the HTML source I am parsing:
<table width="730" border="0" cellspacing="0" cellpadding="2">
<tr>
<td><table width="730" border="0" cellspacing="0" cellpadding="2">
<tr>
<td><h1>Property Information</h1>
<table width="758">
<tr>[IRRELEVANT]</tr>
<tr>[IRRELEVANT]</tr>
<tr>
<td colspan="3"><strong>Property Address:</strong> !!THIS PLAIN TEXT HERE IS WHAT I NEED!! DATA1</td>
<td> </td>
</tr>
<tr>
<td colspan="3"><strong>Mailing Address:</strong>!!NEED THIS TOO!! DATA2</td>
<td> </td>
</tr>
<tr>[IRRELEVANT]</tr>...................
I was using this as a template, but it doesn't work, and I have no idea how to make it work.
Document documentSerialNumberPageData = Jsoup.connect(stringURLOfSerialNumberPage).get(); //connect to serial number page
Elements elementsSerialNumberPageData = documentSerialNumberPageData.select("#tabletext tbody > tr > td > tbody > tr > td > tbody > tr > td"); //this is not even remotely correct... :(
Element elementAddress = elementsSerialNumberPageData.get(0);
System.out.println(elementAddress.text());
My knowledge of HTML/CSS is very limited, but I'm proficient in Java. Any suggestions? Thanks! Full Source Here: https://github.com/PhotonPhighter/NODScraper/blob/master/src/nodscraper/Main.java