1

I am trying to scrape some text from a webpage and saving them in a text file using following code (I am opening links from a text file called links.txt):

import requests
import csv
import random
import string
import re

from bs4 import BeautifulSoup

#Create random string of specific length
def randStr(chars = string.ascii_uppercase + string.digits, N=10):
    return ''.join(random.choice(chars) for _ in range(N))
    
with open("links.txt", "r") as a_file:
  for line in a_file:
    stripped_line = line.strip()
    endpoint = stripped_line
    response = requests.get(endpoint)
    data = response.text
    soup = BeautifulSoup(data, "html.parser")
    for pictags in soup.find_all('col-md-2'):
        lastfilename = randStr()
        file = open(lastfilename + ".txt", "w")
        file.write(pictags.txt)
        file.close()
        print(stripped_line)

the webpage has following attribute:

<div class="col-md-2">

The problem is after running the code noting is happening and I am not receiving any error.

1
  • What are you trying to scrape from that page? Could you explain Commented Aug 26, 2021 at 7:36

3 Answers 3

2

To get all keyword text from the page into a file, you can do:

import requests
from bs4 import BeautifulSoup

url = "http://www.mykeyworder.com/keywords?tags=dog&exclude=&language=en"

soup = BeautifulSoup(requests.get(url).content, "html.parser")

with open("data.txt", "w") as f_out:
    for inp in soup.select('input[type="checkbox"]'):
        print(inp["value"], file=f_out)

This creates data.txt with content:

dog
animal
canine
pet
cute
puppy
happy
young
adorable

...and so on.
Sign up to request clarification or add additional context in comments.

Comments

0

From the documentation of BeautifulSoup here, you can see your line for pictags in soup.find_all('col-md-2') will search for any element with tag name 'col-md-2' not element with that class name. In other word, your code will search element like so <col-md-2></col-md-2>.

You fix your code and try again or pictags in soup.find_all(class_='col-md-2')

2 Comments

Thanks, I tried your recommendation and got this error "file.write(pictags.txt) TypeError: write() argument must be str, not Tag". Sorry to bother you any suggestion is really appreciated
@KatherineElizabethKath : If you are trying to get text content from the retrieved HTML tag, you can try file.write(pictags.text)
0

you can match the elements with relevant attributes. pass a dictionary to the attrs parameter of find_all with the desired attributes of the elements you’re looking for.

pictags = soup.find_all(attrs={'class':'col-md-2'})

this will find all elements with class 'col-md-2'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.