2

I am scraping a fixed content from a particular website. The content lies inside a nested div as shown below:

<div class="table-info">
  <div>
    <span>Time</span>
        <div class="overflow-hidden">
            <strong>Full</strong>
        </div>
  </div>
  <div>
    <span>Branch</span>
        <div class="overflow-hidden">
            <strong>IT</strong>
        </div>
  </div>
  <div>
    <span>Type</span>
        <div class="overflow-hidden">
            <strong>Standard</strong>
        </div>
  </div>
  <div>
    <span>contact</span>
        <div class="overflow-hidden">
            <strong>my location</strong>
        </div>
 </div>
</div>

I want to retrieve the only the content of strong inside the div 'overflow-hidden' inside the span with string value Branch. The code i've used is:

from bs4 import BeautifulSoup
import urllib2 
url = urllib2.urlopen("https://www.xyz.com")
content = url.read()
soup = BeautifulSoup(content)
type = soup.find('div',attrs={"class":"table-info"}).findAll('span')
print type

I've scraped all the span content inside the main div 'table-info', so that i can use conditional statement to retrieve the required content. But if i try to scrap the div content inside the span as :

type = soup.find('div',attrs={"class":"table-info"}).findAll('span').find('div')
print type

i get error as:

AttributeError: 'list' object has no attribute 'find'

Can anyone please give me some idea to retrieve content of the div in the span. Thank you. I'm using python2.7

2 Answers 2

1

It seems like you want to get the content from second div inside the div-"table-info". However,you are trying to get it using the tag which has no relation to what you are trying toa access.

 type = soup.find('div',attrs={"class":"table-info"}).findAll('span').find('div') 

returns error as it is empty.

Better Try this:

from bs4 import BeautifulSoup
import urllib2 
url = urllib2.urlopen("https://www.xyz.com")
content = url.read()
soup = BeautifulSoup(content)
type = soup.find('div',attrs={"class":"table-info"}).findAll('div')
print type[2].find('strong').string
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, the code worked. I guess i was following a totally wrong approach for solving the problem.
0

The findAll returns a list of BS elements, and find is defined on a BS object, not a list of BS objects, hence the error. Your initial part of the code is fine, Do this instead:

from bs4 import BeautifulSoup
import urllib2 

url = urllib2.urlopen("https://www.xyz.com")
content = url.read()
soup = BeautifulSoup(content)

table = soup.find('div',attrs={"class":"table-info"})
spans = table.findAll('span')
branch_span = span[1]
# Do you manipulation with the branch_span

OR

from bs4 import BeautifulSoup
import urllib2 

url = urllib2.urlopen("https://www.xyz.com")
content = url.read()
soup = BeautifulSoup(content)

table = soup.find('div',attrs={"class":"table-info"})
spans = table.findAll('span')

for span in spans:
    if span.text.lower() == 'branch':
        # Do your manipulation

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.