0

I am parsing a table in saved .html document, which looks like:

enter image description here

the html codes are like:

<table id="detailBody" width="100%" cellspacing="0" cellpadding="0" border="0" class="tab2" style="display: block;"><tbody>
                                        <tr><td><ul><li><span>15:00:19</span><span class="red">11.750</span><span class="red">5392</span><span class="fr red">↑</span></li><li><span>14:56:55</span><span class="red">11.750</span><span class="red">17</span><span class="fr red">↑</span></li><li><span>14:56:52</span><span class="red">11.750</span><span class="red">479</span><span class="fr red">↑</span></li><li><span>14:56:49</span><span class="">11.740</span><span class="green">6</span><span class="fr green">↓</span></li><li><span>14:56:46</span><span class="">11.740</span><span class="green">333</span><span class="fr green">↓</span></li><li><span>14:56:43</span><span class="">11.740</span><span class="green">21</span><span class="fr green">↓</span></li><li><span>14:56:40</span><span class="">11.740</span><span class="green">15</span><span class="fr green">↓</span></li><li><span>14:56:37</span><span class="">11.740</span><span class="green">35</span><span class="fr green">↓</span></li><li><span>14:56:34</span><span class="red">11.750</span><span class="red">11</span><span class="fr red">↑</span></li><li><span>14:56:31</span><span class="">11.740</span><span class="green">3</span><span class="fr green">↓</span></li><li><span>14:56:28</span><span class="">11.740</span><span class="green">24</span><span class="fr green">↓</span></li><li><span>14:56:22</span><span class="red">11.750</span><span class="red">291</span><span class="fr red">↑</span></li><li><span>14:56:19</span><span class="">11.740</span><span class="red">198</span><span class="fr red">↑</span></li><li><span>14:56:16</span><span class="green">11.730</span><span class="green">15</span><span class="fr green">↓</span></li></ul></td></tr>
                                    </tbody></table>

What I have so far is:

list_a = soup.find_all('table')[0].tbody.find_all("tr")

for a in list_a:
    for b in a:
        for c in b:
            for d in c:
                for e in d:
                    print e.renderContents()

even though it doesn't looked very nice, the result is like:

15:00:19
11.750
5392
↑
14:56:55
11.750
17
↑
14:56:52
11.750
479
↑

However there are too many contents in the table, I only want the first 10 groups of data in the table. And only the 3rd and 4th items to be put in 2 lists.

i.e.

[“5392”, “17”, “479”, …] 

and

[“↑”, “↑”, “↑”, …] #the “↑” can be changed to something else identical if it's a problem

how can I achieve that? Thanks.

3
  • Add html the not the image. Commented Dec 1, 2015 at 9:26
  • I think he wanted to say that you should add the actual html code and not only the image so we can help you better ;) Commented Dec 1, 2015 at 10:11
  • @SIslam and nablahero, thanks for the comments. Commented Dec 1, 2015 at 14:52

2 Answers 2

2

Why didn't you tried to find all span items directly because that is what you actually want or not? So instead of

list_a = soup.find_all('table')[0].tbody.find_all("tr")

try

list_a = soup.find_all('table')[0].tbody.find_all("tr")[0].find_all("span")

I don't know if you're table only has one row. If yes this shoudl work and give you all the spans and you just skip the one's you do not need. If you got multiple rows you have to iterate over the rows like this

list_a = soup.find_all('table')[0].tbody.find_all("tr")
for a in list_a:
    a.find_all("span")

and again you will get all span items. I hope this leads you in the right direction!

Sign up to request clarification or add additional context in comments.

1 Comment

thanks for the help. hope you don't mind I choose another which answered all my questions. :)
1

The following will extract your two columns using the span tag inside the li elements:

html = """
<table id="detailBody" width="100%" cellspacing="0" cellpadding="0" border="0" class="tab2" style="display: block;">
<tbody>
<tr>
    <td>
    <ul>
    <li><span>15:00:19</span><span class="red">11.750</span><span class="red">5392</span><span class="fr red">?</span></li>
    <li><span>14:56:55</span><span class="red">11.750</span><span class="red">17</span><span class="fr red">?</span></li>
    <li><span>14:56:52</span><span class="red">11.750</span><span class="red">479</span><span class="fr red">?</span></li>
    <li><span>14:56:49</span><span class="">11.740</span><span class="green">6</span><span class="fr green">?</span></li>
    <li><span>14:56:46</span><span class="">11.740</span><span class="green">333</span><span class="fr green">?</span></li>
    <li><span>14:56:43</span><span class="">11.740</span><span class="green">21</span><span class="fr green">?</span></li>
    <li><span>14:56:40</span><span class="">11.740</span><span class="green">15</span><span class="fr green">?</span></li>
    <li><span>14:56:37</span><span class="">11.740</span><span class="green">35</span><span class="fr green">?</span></li>
    <li><span>14:56:34</span><span class="red">11.750</span><span class="red">11</span><span class="fr red">?</span></li>
    <li><span>14:56:31</span><span class="">11.740</span><span class="green">3</span><span class="fr green">?</span></li>
    <li><span>14:56:28</span><span class="">11.740</span><span class="green">24</span><span class="fr green">?</span></li>
    <li><span>14:56:22</span><span class="red">11.750</span><span class="red">291</span><span class="fr red">?</span></li>
    <li><span>14:56:19</span><span class="">11.740</span><span class="red">198</span><span class="fr red">?</span></li>
    <li><span>14:56:16</span><span class="green">11.730</span><span class="green">15</span><span class="fr green">?</span></li>
    </ul>
    </td>
</tr>
</tbody></table>"""

soup = BeautifulSoup(html)

col_3 = []
col_4 = []

for li in soup.find_all('table')[0].find_all("li"):
    cols = li.find_all("span")
    col_3.append(cols[2].text)
    col_4.append(cols[3].text)

print col_3 
print col_4

This would give you the following output:

[u'5392', u'17', u'479', u'6', u'333', u'21', u'15', u'35', u'11', u'3', u'24', u'291', u'198', u'15']
[u'?', u'?', u'?', u'?', u'?', u'?', u'?', u'?', u'?', u'?', u'?', u'?', u'?', u'?']

1 Comment

this's perfect. thanks for the guiding, which benefits many learners.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.