0

I have the following html code structure but I don't know how to extract the values of text1 and text2 from <td> <a href ="....."> text1 </a> text2 </td>

<tbody>
        <tr class="trBgGrey"><td nowrap="nowrap">1</td><td nowrap="nowrap">11</td><td class="tdAlignL font13 fontStyle" nowrap="nowrap"><a href="http://www.hkjc.com/english/racing/horse.asp?horseno=S205">SWEET BEAN</a>(S205)</td><td class="tdAlignL font13 fontStyle" nowrap="nowrap"><a href="http://www.hkjc.com/english/racing/jockeyprofile.asp?jockeycode=MOJ&amp;season=Current">J Moreira</a></td><td class="tdAlignL font13 fontStyle" nowrap="nowrap"><a href="http://www.hkjc.com/english/racing/trainerprofile.asp?trainercode=FC&amp;season=Current">C Fownes</a></td><td nowrap="nowrap">121</td><td nowrap="nowrap">1034</td><td nowrap="nowrap">7</td><td nowrap="nowrap">-</td><td align="center" nowrap="nowrap"><table width="80" border="0" cellSpacing="0" cellPadding="0"><tr><td width="16" align="center">8</td><td width="16" align="center">8</td><td width="16" align="center">8</td><td width="16" align="center">3</td><td width="16" align="center">1</td></tr></table></td><td nowrap="nowrap">1.51.13</td><td nowrap="nowrap">5.3</td></tr>
</tr><tr class="trBgGrey"><td nowrap="nowrap">3</td><td nowrap="nowrap">2</td><td class="tdAlignL font13 fontStyle" nowrap="nowrap"><a href="http://www.hkjc.com/english/racing/horse.asp?horseno=V311">CITY WINNER</a>(V311)</td><td class="tdAlignL font13 fontStyle" nowrap="nowrap"><a href="http://www.hkjc.com/english/racing/jockeyprofile.asp?jockeycode=RN&amp;season=Current">N Rawiller</a></td><td class="tdAlignL font13 fontStyle" nowrap="nowrap"><a href="http://www.hkjc.com/english/racing/trainerprofile.asp?trainercode=TYS&amp;season=Current">Y S Tsui</a></td><td nowrap="nowrap">132</td><td nowrap="nowrap">978</td><td nowrap="nowrap">6</td><td nowrap="nowrap">1</td><td align="center" nowrap="nowrap"><table width="80" border="0" cellSpacing="0" cellPadding="0"><tr><td width="16" align="center">9</td><td width="16" align="center">9</td><td width="16" align="center">9</td><td width="16" align="center">10</td><td width="16" align="center">3</td></tr></table></td><td nowrap="nowrap">1.51.30</td><td nowrap="nowrap">22</td></tr>
        </tbody>

I tried my codes as follows but cannot get the text values

import requests
from bs4 import BeautifulSoup
import urllib.request

race_link = 'http://racing.hkjc.com/racing/info/meeting/Results/English/Local/20171227/HV'
sauce1 = urllib.request.urlopen(race_link).read()
soup1 = BeautifulSoup(sauce1, 'html.parser')

for link in soup1.find_all('tr', {'class': 'trBgGrey'}):
    for ilink in link.find_all('td'):
        print(ilink.string)

But my results return to:

1
11
None
J Moreira
C Fownes
121
1034
7
-
None
8
8
8
3
1
1.51.13
5.3
.....

My expected results are

1
11
SWEET BEAN
(S205)
J Moreira
C Fownes
121
1034
7
-
None
8
8
8
3
1
1.51.13
5.3
......

I can get the values from the html structure as

<td>text1</td><td>text2</td>

But I don't know how to code to get the values from the html structure as

<td><a href="....">text1</a>text2</td>

How can I get the values from the second structure?

6
  • I mean, I would like to extract text1 and text2 from the following html structure: Commented Jan 1, 2018 at 8:35
  • You want the horse name and ID? Commented Jan 1, 2018 at 8:38
  • sorry that it is my first time to post a thread here and missed out something. I amended my thread. In fact, I want to know how get the values (text1 and text2) inside a html structure as follows: <td><a hre="......">text1</a>text2</td> Commented Jan 1, 2018 at 8:38
  • @cᴏʟᴅsᴘᴇᴇᴅ: in fact, i need all values including the horse name and ID. But now that I can only get all other values except horse name and ID. I want to get those both as well. Thanks! Commented Jan 1, 2018 at 8:40
  • 1. Please add example of expected output. 2. the code you added is not giving the output you gave. for example, there is no <tr> and there is no class trBgGrey Commented Jan 1, 2018 at 8:47

1 Answer 1

1

Try something like that:

from bs4 import element

def print_strings(elemnt):
    for c in elemnt.children:
        if isinstance(c, element.Tag):
            print_strings(c)
        else:
            print (c, end=" ")

for link in soup1.find_all('tr', {'class': 'trBgGrey'}):
    for ilink in link.find_all('td'):
        print_strings(ilink)
        print()

Try It Online

Sign up to request clarification or add additional context in comments.

1 Comment

@how2code please accept the answer if it helped and solve your question

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.