2

I successfully scraped the first page of the website, but when I tried to scrape mutiples pages, it worked but the result is totally wrong.

Code:

import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin
for num in range(1,15):
    res = requests.get('http://www.abcde.com/Part?Page={num}&s=9&type=%8172653').text
    soup = BeautifulSoup(res,"lxml")
    for item in soup.select(".article-title"):
        print(urljoin('http://www.abcde.com',item['href']))

It only changed one number in every page's url, for example,

http://www.abcde.com/Part?Page=1&s=9&type=%8172653
http://www.abcde.com/Part?Page=2&s=9&type=%8172653

I got total 14 pages of this.

My code worked, but it just repeatedly print out the first page's url for 14 times. The result I expected was to print out all different urls from different pages using loops.

1
  • 3
    You're not actually formatting the string to replace the number into it... So you either need to prefix the string with f if you're using 3.6+ or otherwise .format(num=num) the string to put the page number in... Commented Oct 12, 2017 at 10:07

1 Answer 1

3

As Jon Clements pointed, format url as below :

res = requests.get('http://www.abcde.com/Part?Page={}&s=9&type=%8172653'.format(num)).text

You can find more about python format strings at pyformat.info.

Sign up to request clarification or add additional context in comments.

2 Comments

Hi! Thanks for the info. I tried, but it said AttributeError: 'Response' object has no attribute 'format'
Sorry my bad. Missed one round bracket at last. Updated the code

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.