2

I'm trying to scrape data from a table using BeautifulSoup. The following problem is occurring: [u'A Southern RV, Inc.1642 E New York AveDeland, FLPhone: (386) 734-5678Website: www.southernrvrentals.comEmail: [email protected]\xa0\n'] from a table that has rows that look like

<table id="ctl00_TemplateBody_WebPartManager1_gwpste_container_SearchForm_ciSearchForm_RTable" border="0">
                            <tbody><tr style="background-color:#990000;">
                                <th align="left" colspan="3" style="margin-top:5px;margin-bottom:5px;"><span id="ctl00_TemplateBody_WebPartManager1_gwpste_container_SearchForm_ciSearchForm_RSCount" style="color:White;">Your search results returned (85) records </span></th>
                            </tr><tr>
                                <td class="ml15" align="left" valign="top"><img src="./RVDealers-Florida_files/AfterMarket2.gif" alt="After Market Member Logo" border="0"> </td><td class="ml15" align="left" valign="top"><span style="font-weight:bold;">A Southern RV, Inc.</span><br>1642 E New York Ave<br>Deland, FL<br>Phone: (386) 734-5678<br>Website: <a href="http://www.southernrvrentals.com/" target="_blank">www.southernrvrentals.com</a><br>Email: <a href="mailto:[email protected]" target="_blank">[email protected]</a></td><td class="ml15" align="left" valign="top">&nbsp;</td>
                            </tr><tr>
                                <td colspan="3"><hr></td>
                            </tr><tr>
                                <td class="ml15" align="left" valign="top"><img src="./RVDealers-Florida_files/AfterMarket2.gif" alt="After Market Member Logo" border="0"> </td><td class="ml15" align="left" valign="top"><span style="font-weight:bold;">Alec's Truck Trailer &amp; RV</span><br>16960 S Dixie Hwy<br>Miami, FL<br>Phone: (305) 234-5444<br>Website: <a href="http://www.alecstruck.com/" target="_blank">www.alecstruck.com</a><br>Email: <a href="mailto:[email protected]" target="_blank">[email protected]</a></td><td class="ml15" align="left" valign="top">&nbsp;</td>
                            </tr><tr>
                                <td colspan="3"><hr></td>
                            </tr><tr>
                                <td class="ml15" align="left" valign="top"><img src="./RVDealers-Florida_files/RVRAMember2.gif" alt="RVRA Member Logo" border="0"><br>  <img src="./RVDealers-Florida_files/GoRVDealer2.gif" alt="Go RV Dealer Logo" border="0"><br> </td><td class="ml15" align="left" valign="top"><span style="font-weight:bold;">All Star Coaches</span><br>131 NW 73rd Terraces, Bay 1117<br>Fort Lauderdale, FL<br>Phone: (866) 838-4465<br>Website: <a href="http://www.allstarcoaches.com/" target="_blank">www.allstarcoaches.com</a><br>Email: <a href="mailto:[email protected]" target="_blank">[email protected]</a></td><td class="ml15" align="left" valign="top">&nbsp;</td>
                            </tr><tr>
                                <td colspan="3"><hr></td>
                            </tr><tr>
                                <td class="ml15" align="left" valign="top"><img src="./RVDealers-Florida_files/RVDAMember2.gif" alt="RVDA Member Logo" border="0"><br>  <img src="./RVDealers-Florida_files/GoRVDealer2.gif" alt="Go RV Dealer Logo" border="0"><br> </td><td class="ml15" align="left" valign="top"><span style="font-weight:bold;">Alliance Coach</span><br>4505 Monaco Way<br>Wildwood, FL<br>Phone: (866) 888-8941<br>Website: <a href="http://www.alliancecoachonline.com/" target="_blank">www.alliancecoachonline.com</a><br>Email: <a href="mailto:[email protected]" target="_blank">[email protected]</a></td><td class="ml15" align="left" valign="top"><table width="100%" border="0" cellpadding="0" cellspacing="5"><tbody><tr><td valign="top" width="75" align="left"><img src="./RVDealers-Florida_files/Cert_web.jpg" height="75" width="75" alt="Certified RV Technician" border="0"></td> <td valign="top" style="font-size:8px;font-weight:bold;" align="left" nowrap=""><img src="./RVDealers-Florida_files/RVLCenter_web.jpg" height="33" width="93" alt="RV Learning Center Certifications" border="0"><br>&nbsp;Certifications:<ul><li style="font-size:7px;">&nbsp;Service Writer/Advisor</li><li style="font-size:7px;">&nbsp;Parts Specialist</li><li style="font-size:7px;">&nbsp;Parts Manager</li><li style="font-size:7px;">&nbsp;Warranty Administrator</li></ul></td></tr></tbody></table></td>
                            </tr><tr>
                                <td colspan="3"><hr></td>

The problem is that when I scrape the data, it all condenses into one long string without any spaces or carriage returns. How can I fix this? I'm using this code to extract text from the table:

mech = Browser()
page = mech.open(BASE_URL_DIRECTORY)
html = page.read()
soup = BeautifulSoup(html)
data = extract(soup)

def extract(soup):
    table = soup.find("table",attrs={'id':'ctl00_TemplateBody_WebPartManager1_gwpste_container_SearchForm_ciSearchForm_RTable'})
    #print table
        data = []
    for row in table.findAll("tr"):
        s = row.getText()
        data.append(s)
    return data

1 Answer 1

1

You can use replace_with() to replace every br tag with a new-line:

def extract(soup):
    table = soup.find("table", attrs={'id':'ctl00_TemplateBody_WebPartManager1_gwpste_container_SearchForm_ciSearchForm_RTable'})
    for br in table.find_all('br'):
        br.replace_with('\n')
    return table.get_text().strip()

For the HTML input you've provided it prints:

A Southern RV, Inc.

1642 E New York Ave
Deland, FL
Phone: (386) 734-5678
Website: www.southernrvrentals.com
Email: [email protected]
Sign up to request clarification or add additional context in comments.

2 Comments

I tried your solution but it only produced the names (in this example A Southern RV, Inc). I've included a more comprehensive sample of the HTML I'm dealing with; I'd really appreciate it if you'd take a look.
@Apollo tried on the example you've provided - it does show the results nicely with new-lines. Could you clarify what the problem is now? Thanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.