1

I'm newbie to HTML parsers. I'm actually trying to parse the source code of the webpage with url (http://www.quora.com/How-many-internships-are-necessary-for-a-B-Tech-student). I'm trying to get the answer_count.

I tried it in the following way:

import urllib2
from bs4 import BeautifulSoup

q = urllib2.urlopen(url)
soup = BeautifulSoup(q)
divs = soup.find_all('div',class_='answer_count')

But I get the list 'divs' as empty. Why is it so? Where am I wrong? How do I implement it to get the result as '2 Answers'?

3
  • 1
    I didn't found any answer_count class? Commented Jul 23, 2014 at 11:57
  • 1
    There is a answer_count class in the source code! Here's a small patch: <div class="answer_count">2 Answers<span id="ld_bdnqjl_196692"></span></div> Commented Jul 23, 2014 at 12:04
  • 1
    I agree with MA1, there is no answer_count in the source that I load. I think you are looking at the source from being logged in as opposed to what urllib2 grabs. Try looking at the source from incognito mode in chrome to see if you still find the div. Commented Jul 23, 2014 at 13:34

1 Answer 1

2

Maybe you don't have the same page as us on your browser (because you are logged in or so).

When I look at the webpage you provided with Google Chrome, there is nowhere 'answer_count' in the source code. So if Google chrome doen't find it, BeautifulSoup won't either

Sign up to request clarification or add additional context in comments.

1 Comment

I woulds suggest to use the python 'requests' library. You'll be enable to log in from within your script to any website

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.