0

I want to take website name from user input and maximum no. of pages that he want to crawl for crawling website...but can't getting any solution..here's my code

import requests
from bs4 import *
from urllib import request


url1 = input("Enter url you want to crawl:")
max_pages1 = int(input("Enter no. of pages you want to crawl:"))


def web_crawler(max_pages,url):
   page = 1
   while page <= max_pages:
      url4 = str(url) + str(page)
      url_get = requests.get(url4)
      plain_text = url_get.text
      soup = BeautifulSoup(plain_text,"html.parser")
      for a in soup.findAll('a',{'rel':'bookmark'}):
          href = a.get('href')
          title = a.string
          #print(title)
          print(href)
          #info_about_web_pages(href)
      page +=1

def info_about_web_pages(url):
   url_get = requests.get(url)
   plain_text = url_get.text
   soup = BeautifulSoup(plain_text,"html.parser" )
   links = set()
   for about in soup.findAll('a'):
       href = about.get('href')
       links.update([href])

   print(links)

web_crawler(max_pages1,url1)

It shows me nothing in output

2
  • do you have an example of the url you are trying to do this for? Are you sure that an anchor with the attribute 'rel': 'bookmark' is in its source code? Commented Feb 19, 2017 at 17:38
  • Yes url is in rel:bookmark.... .. url is fonearena.com/blog Commented Feb 20, 2017 at 14:45

1 Answer 1

1

If there is no anchor with the attributes you are trying to find in the html source code then this will always print nothing. try printing soup.prettify() and see if the tag you are looking for even exists. More often than not when I am not printing the values I'm expecting it's because the value does not have the attributes I am looking for.

Sign up to request clarification or add additional context in comments.

1 Comment

in the line after soup = BeautifulSoup(plain_text,"html.parser" ) put print(str(soup.prettify()))

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.