1

I'm scraping data off of a website and I need to insert each li element text into its own row on a MySQL table.

Source

 https://printcopy.info/?mod=erc&brand=Kyocera&model=TASKalfa+2460ci&page=1

This codes prints out all text from each li

parent = driver.find_elements_by_class_name("ercRow")
for link in parent:
    links = link.find_elements_by_tag_name('li')
    for l in links:
        print(l.text)

Results

Code:...
Description:...
Cause:...
Remedy:...

I now need to turn each li into its on variable so I can insert them into a mysql table like this:

id |   code       |     desc       |    caus      |   reme
 1    code...           desc...         cause...      reme..
 2    code...           desc...         cause...      reme..
 3    code...           desc...         cause...      reme..

I tried:

parent = driver.find_elements_by_class_name("ercRow")
for link in parent:
    links = link.find_elements_by_tag_name('li')
    for l in links:
        print(l[0].text)
        print(l[1].text)
        print(l[2].text)
        print(l[3].text)

Error:

    print(l[0].text)
        TypeError: 'WebElement' object is not subscriptable

Any help would be greatly appreciated. Thank you.

1 Answer 1

1

There's no need to use Selenium because the desired content is available on the source code without javascript enabled, being so, we can use BeautifulSoup, i.e.:

from bs4 import BeautifulSoup as bs
import requests

mod = "erc"
brand = "Kyocera"
model = "TASKalfa+2460ci"

# get total pages
u = f"https://printcopy.info/?mod={mod}&brand={brand}&model={model}"
soup = bs(requests.get(u).text, "html5lib")

# find the total number of pages
pages = int([i.findAll('option') for i in soup.findAll('select', {"id": "selectNumPages"} )][0][-1].text) + 1
# print(pages)

for page in range(1, pages):
    u = f"https://printcopy.info/?mod={mod}&brand={brand}&model={model}&page={page}"
    soup = bs(requests.get(u).text, "html5lib")
    ercRow = soup.findAll("ul", {"class": "ercRow"})
    for ul in ercRow:
        lis = ul.findAll("li")
        code = lis[0].text.strip("Code: ")
        description = lis[1].text.strip("Description: ")
        causes = lis[2].text.strip("Causes: ")
        remedy = lis[3].text.strip("Remedy: ")
        print(code, description, causes, remedy, sep="\n")
        # insert the values on db...

Output:

C0070
FAX PWB incompatible detection error
Abnormal detection of FAX control PWB incompatibility in the initial communication with the FAX control PWB, any normal communication command is not transmitted.
1 Checking the FAX PWB The incompatible FAX PWB is installed. Install the FAX PWB for the applicable model. 2 Firmware upgrade The FAX firmware is faulty. Reinstall the FAX firmware. 3 Replacing the main PWB The main PWB is faulty. Replace the main PWB.
C0100
Backup memory device error
An abnormal status is output from the flash memory.
1 Resetting the main power The flash memory does not operate properly. Turn off the power switch and unplug the power plug. After 5s passes, reconnect the power plug and turn on the power switch. 2 Checking the main PWB The connector or the FFC is not connected properly. Or, the wire, FFC, the PWB is faulty. Clean the terminal of the connectors on the main PWB, reconnect the connector of the wire, and reconnect the FFC terminal. If the wire or the FFC is faulty, repair or replace them. If not resolved, replace the main PWB.

...
Sign up to request clarification or add additional context in comments.

6 Comments

Wow, thank you so much. One last question, what if the page range is more than 20? Some of the other models its 37 pages, some are 20...etc
You're welcome :) You can type any number you want, i.e range(37). If my answer helped you, please consider accepting is as the correct answer and give it a 1+, thanks!
Done. Thanks again. I just wanted to see if there was a way for it to automatically get the number of pages instead of me manually giving the rang.
Thanks. I've updated the answer to dynamically add the total number of pages to scrap.
I'm definitely not ;), make sure you catch the errors it may throw with try, except. GL!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.