0

I have created a piece of code to scrape an article off the ft.com website.

url = ""
r = requests.get(url)
soup = bs4.BeautifulSoup(r.content, "html.parser")
for a in soup.find_all('div', {"id":"storyContent"}):
    print a

1) On the website, there is a div tag with id:storyContent but I get no output as a result of this code which means that it didn' enter the loop at all! What might the reason be?
Now ft.com does not give access to articles without entering username and password.
I have logged into ft.com using chrome.
Suppose my username, password details are the following:
Username : [email protected]
Pass: 12345
I need to know either of the following:
2) How can I provide this authentication in my code?
3) How can I use the session on chrome (on which I'm already logged in) to acces the webpage/article details.
4) If authentication is the resson behind no output!
5) I am trying to get the article's body out of the webpage.

Thanks!

8
  • use python mechanize for authentication and fetch the source and then use beautiful soup Commented Aug 30, 2016 at 11:45
  • mechanize is no longer supported. Commented Aug 30, 2016 at 11:46
  • Try and get the form, then you can submit with r = requests.post('http://httpbin.org/post', data = {'key':'value'}) Commented Aug 30, 2016 at 11:50
  • @DanielLee, if you could help me with one more little thing please. I am trying to extract the p tags within the div tags for div in soup.find_all('div', {"id":"storyContent"}): and then for p in div.find_all('p'): print p.string. There are certain p tags that also contain anchors. So the p.string returns a None. How do I get the entire article out? Thanks! Commented Aug 30, 2016 at 12:38
  • try using for p in div.get_text() print p Commented Aug 30, 2016 at 12:45

1 Answer 1

1

Rather start with this.

url = "http://www.ft.com"
r = requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")
for a in soup:
    print a

Then add a requests when you find the key:value pair required

r = requests.post('http://www.ft.com/xxx', data = {'key':'value'})
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.