web scraping using python beautifulsoup but not getting the value

Question

I am using this script to scrape the author information from sciencedirect articles,but I am getting none when trying to print the value.

import requests
from bs4 import BeautifulSoup
from urllib import urlopen
import csv
import sys
reload(sys)
sys.setdefaultencoding('utf-8')

with open('urls.txt') as inf:
    urls = (line.strip() for line in inf)
    for url in urls:
        site = urlopen(url)   
        soup = BeautifulSoup(site, "lxml")
        for item in soup.find_all("div", {"class": "AuthorGroups"}):
            final = item.text,url
            print final

In urls.txt I used these 2 urls (https://www.sciencedirect.com/science/article/pii/009286749290520M,https://www.sciencedirect.com/science/article/pii/0092867495903682)

Does it scrape other fields from sciencedirect, or does it work with other links in the textfile? It could be that ScienceDirect doesn't allow scraping. — CMorgan
– CMorgan, Commented Dec 7, 2018 at 8:16
I am not able to fetch anything from scienceDirect .But when I am using this program for other journals its working.And I am getting none when trying to print the value, but can be found in 'Inspect Element' — Aishwarya
– Aishwarya, Commented Dec 7, 2018 at 9:19

ewwink · Accepted Answer · 2018-12-07 08:30:24Z

2

if BeautifulSoup not returned expected value, see html response from the server.

Your request blocked because it need to set proper user-agent.

.....
headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:56.0) Gecko/20100101 Firefox/56.0'}
for url in urls:
    print url
    site = requests.get(url, headers=headers).text
    .....

answered Dec 7, 2018 at 8:30

ewwink

19.3k2 gold badges49 silver badges56 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Aishwarya Over a year ago

When I tried by including headers its working.Thank you

ewwink Over a year ago

you're welcome. if it solved, please mark the answer as correct.

Collectives™ on Stack Overflow

web scraping using python beautifulsoup but not getting the value

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related