12

I have the following HTML code:

    <td class="image">
      <a href="/target/tt0111161/" title="Target Text 1">
       <img alt="target img" height="74" src="img src url" title="image title" width="54"/>
      </a>
     </td>
     <td class="title">
      <span class="wlb_wrapper" data-caller-name="search" data-size="small" data-tconst="tt0111161">
      </span>
      <a href="/target/tt0111161/">
       Other Text
      </a>
      <span class="year_type">
       (2013)
      </span>

I am trying to use beautiful soup to parse certain elements into a tab-delimited file. I got some great help and have:

for td in soup.select('td.title'):
 span = td.select('span.wlb_wrapper')
 if span:
     print span[0].get('data-tconst') # To get `tt0082971`

Now I want to get "Target Text 1" .

I've tried some things like the above text such as:

for td in soup.select('td.image'): #trying to select the <td class="image"> tag
img = td.select('a.title') #from inside td I now try to look inside the a tag that also has the word title
if img:
    print img[2].get('title') #if it finds anything, then I want to return the text in class 'title'
3
  • 1
    Have you made any attempt at extracting it yourself? Commented Feb 6, 2014 at 1:08
  • I've edited the post above Commented Feb 6, 2014 at 1:19
  • another thread here: stackoverflow.com/questions/41369344/… Commented Dec 29, 2016 at 14:22

2 Answers 2

14

If you're trying to get a different td based on the class (i.e. td class="image" and td class="title" you can use beautiful soup as a dictionary to get the different classes.

This will find all the td class="image" in the table.

from bs4 import BeautifulSoup

page = """
<table>
    <tr>
        <td class="image">
           <a href="/target/tt0111161/" title="Target Text 1">
            <img alt="target img" height="74" src="img src url" title="image title" width="54"/>
           </a>
          </td>
          <td class="title">
           <span class="wlb_wrapper" data-caller-name="search" data-size="small" data-tconst="tt0111161">
           </span>
           <a href="/target/tt0111161/">
            Other Text
           </a>
           <span class="year_type">
            (2013)
           </span>
        </td>
    </tr>
</table>
"""
soup = BeautifulSoup(page)
tbl = soup.find('table')
rows = tbl.findAll('tr')
for row in rows:
    cols = row.find_all('td')
    for col in cols:
        if col.has_attr('class') and col['class'][0] == 'image':
            hrefs = col.find_all('a')
            for href in hrefs:
                print href.get('title')

        elif col.has_attr('class') and col['class'][0] == 'title':
            spans = col.find_all('span')
            for span in spans:
                if span.has_attr('class') and span['class'][0] == 'wlb_wrapper':
                    print span.get('data-tconst')
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks, can I also add in statement to retrieve the value for the "data-tconst" tag?
yep, you can add an elif statement that looks for td's with title, pasting code in a comment failed, so I will just update my answer.
Thanks, now I just added def getinfo: before all that. Can I write getinfo to a CSV?
I've personally never written to csv, but you should be able to open a file before the iteration and instead of printing out the values, write them to a file. After the iterator, save the file.
0

span.wlb_wrapper is a selector used to select <span class="wlb_wrapper" data-caller-name="search" data-size="small" data-tconst="tt0111161">. Refer this & this for more information on selectors

change this in your python code span = td.select('span.wlb_wrapper') to span = td.select('span') & also span = td.select('span.year_type') and see what it returns.

If you try above and analyze what span holds you will get what you want.

1 Comment

I've edited the body text to show what I attempted to do in my code. I've tried changing span.wlb_wrapper to just span but it now just returns a value of "None'

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.