I want to extract a specific table from an html document that contains mutliple tables, but unfortunately there are no identifiers. There is a table title, however. I just can't seem to figure it out.
Here is an example html file
<BODY>
<TABLE>
<TH>
<H3> <BR>TABLE 1 </H3>
</TH>
<TR>
<TD>Data 1 </TD>
<TD>Data 2 </TD>
</TR>
<TR>
<TD>Data 3 </TD>
<TD>Data 4 </TD>
</TR>
<TR>
<TD>Data 5 </TD>
<TD>Data 6 </TD>
</TR>
</TABLE>
<TABLE>
<TH>
<H3> <BR>TABLE 2 </H3>
</TH>
<TR>
<TD>Data 7 </TD>
<TD>Data 8 </TD>
</TR>
<TR>
<TD>Data 9 </TD>
<TD>Data 10 </TD>
</TR>
<TR>
<TD>Data 11 </TD>
<TD>Data 12 </TD>
</TR>
</TABLE>
</BODY>
I can use beautifulSoup 4 to get tables by id or name, but I need just a single table that is only identifiable by position.
I know that I can get the first table with:
tmp = f.read()
soup = BeautifulSoup(tmp) ## make it readable
table = soup.find('table') ### gets first table
but how would I get the second table?