1

I want to extract a specific table from an html document that contains mutliple tables, but unfortunately there are no identifiers. There is a table title, however. I just can't seem to figure it out.

Here is an example html file

<BODY>
<TABLE>
<TH>
<H3>    <BR>TABLE 1    </H3>
</TH>
<TR>
<TD>Data 1    </TD>
<TD>Data 2    </TD>
</TR>
<TR>
<TD>Data 3    </TD>
<TD>Data 4    </TD>
</TR>
<TR>
<TD>Data 5    </TD>
<TD>Data 6    </TD>
</TR>
</TABLE>

<TABLE>
<TH>
<H3>    <BR>TABLE 2    </H3>
</TH>
<TR>
<TD>Data 7    </TD>
<TD>Data 8    </TD>
</TR>
<TR>
<TD>Data 9    </TD>
<TD>Data 10    </TD>
</TR>
<TR>
<TD>Data 11    </TD>
<TD>Data 12    </TD>
</TR>
</TABLE>
</BODY>

I can use beautifulSoup 4 to get tables by id or name, but I need just a single table that is only identifiable by position.

I know that I can get the first table with:

tmp = f.read()
soup = BeautifulSoup(tmp) ## make it readable
table = soup.find('table') ### gets first table

but how would I get the second table?

2 Answers 2

2

You can rely on the table title.

Find the element by text passing a function as a text argument value, then get the parent:

table_name = "TABLE 1" 

table = soup.find(text=lambda x: x and table_name in x).find_parent('table')
Sign up to request clarification or add additional context in comments.

4 Comments

Just curious, why x and? Surely bool(x) would return True if table_name in x did, no? Are you just short-circuiting for performance?
@jedwards it's just a function argument, you can actually name it however you want, probably text would be a better choice.
@jedwards we are checking for x since it can also be None which would cause a TypeError without this extra check.
the second comment was what I was wondering about -- makes perfect sense.
0

If it's only identifiable by position, meaning it's always the 2nd table in the website, you could do:

tmp = f.read()
soup = BeautifulSoup(tmp)

# this will return the second table from the website
all_tables = soup.find_all('table')
second_table = all_tables[1]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.