I am parsing a table in saved .html document, which looks like:
the html codes are like:
<table id="detailBody" width="100%" cellspacing="0" cellpadding="0" border="0" class="tab2" style="display: block;"><tbody>
<tr><td><ul><li><span>15:00:19</span><span class="red">11.750</span><span class="red">5392</span><span class="fr red">↑</span></li><li><span>14:56:55</span><span class="red">11.750</span><span class="red">17</span><span class="fr red">↑</span></li><li><span>14:56:52</span><span class="red">11.750</span><span class="red">479</span><span class="fr red">↑</span></li><li><span>14:56:49</span><span class="">11.740</span><span class="green">6</span><span class="fr green">↓</span></li><li><span>14:56:46</span><span class="">11.740</span><span class="green">333</span><span class="fr green">↓</span></li><li><span>14:56:43</span><span class="">11.740</span><span class="green">21</span><span class="fr green">↓</span></li><li><span>14:56:40</span><span class="">11.740</span><span class="green">15</span><span class="fr green">↓</span></li><li><span>14:56:37</span><span class="">11.740</span><span class="green">35</span><span class="fr green">↓</span></li><li><span>14:56:34</span><span class="red">11.750</span><span class="red">11</span><span class="fr red">↑</span></li><li><span>14:56:31</span><span class="">11.740</span><span class="green">3</span><span class="fr green">↓</span></li><li><span>14:56:28</span><span class="">11.740</span><span class="green">24</span><span class="fr green">↓</span></li><li><span>14:56:22</span><span class="red">11.750</span><span class="red">291</span><span class="fr red">↑</span></li><li><span>14:56:19</span><span class="">11.740</span><span class="red">198</span><span class="fr red">↑</span></li><li><span>14:56:16</span><span class="green">11.730</span><span class="green">15</span><span class="fr green">↓</span></li></ul></td></tr>
</tbody></table>
What I have so far is:
list_a = soup.find_all('table')[0].tbody.find_all("tr")
for a in list_a:
for b in a:
for c in b:
for d in c:
for e in d:
print e.renderContents()
even though it doesn't looked very nice, the result is like:
15:00:19
11.750
5392
↑
14:56:55
11.750
17
↑
14:56:52
11.750
479
↑
However there are too many contents in the table, I only want the first 10 groups of data in the table. And only the 3rd and 4th items to be put in 2 lists.
i.e.
[“5392”, “17”, “479”, …]
and
[“↑”, “↑”, “↑”, …] #the “↑” can be changed to something else identical if it's a problem
how can I achieve that? Thanks.
