Python Selenium - Scraping javascript pagination

Question

I've been building this scraper (with some massive help from users here) to get data on some companies' debt with the public sector and I've been able to get to the site, input the desired search parameters and scrape the first 50 results (out of 300). The problem I've encountered is that this page's pagination has the following characteristics:

It does not possess a next page button
The URL doesn't change with the pagination
The pagination is done with a Javascript script

Here's the code so far:

path_driver = "C:/Users/CS330584/Documents/Documentos de Defesa da Concorrência/Automatização de Processos/chromedriver.exe"
website = "https://sat.sef.sc.gov.br/tax.NET/Sat.Dva.Web/ConsultaPublicaDevedores.aspx"
value_search = "300"
final_table = []


driver = webdriver.Chrome(path_driver)
driver.get(website)
search_max = driver.find_element_by_id("Body_Main_Main_ctl00_txtTotalDevedores")
search_max.send_keys(value_search)
btn_consult = driver.find_element_by_id("Body_Main_Main_ctl00_btnBuscar")
btn_consult.click()

driver.implicitly_wait(10)

cnpjs = driver.find_elements_by_xpath("//*[@id='Body_Main_Main_grpDevedores_gridView']/tbody/tr/td[1]")
empresas = driver.find_elements_by_xpath("//*[@id='Body_Main_Main_grpDevedores_gridView']/tbody/tr/td[2]")
dividas = driver.find_elements_by_xpath("//*[@id='Body_Main_Main_grpDevedores_gridView']/tbody/tr/td[3]")
for i in range(len(empresas)):
    temp_data = {'CNPJ' : cnpjs[i].text,
               'Empresas' : empresas[i].text,
                'Divida' : dividas[i].text
                }
    final_table.append(temp_data)

How can I navigate through the pages in order to scrape their data ? Thank you all for the help!

Have you checked to see if the government offers an API where you can fetch this directly, instead of hacking it with scraping? — Tim Roberts
– Tim Roberts, Commented Apr 27, 2022 at 19:21
The benefit of Selenium is that you can inject keystrokes identical to what you would type as a human. Whatever you do as a human to trigger the next page, you can inject with Selenium. — Tim Roberts
– Tim Roberts, Commented Apr 27, 2022 at 19:23

incrediblejonas · Accepted Answer · 2022-04-27 19:39:40Z

If you inspect the page and look at what happens when you click on the next page button, you'll see in the tag they're actually executing some javascript. It looks like this:

<a href="javascript:GridView_ScrollToTop(&quot;Body_Main_Main_grpDevedores_gridView&quot;);__doPostBack('ctl00$ctl00$ctl00$Body$Main$Main$grpDevedores$gridView','Page$5')"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">6</font></font></a>

But if you take that javascript call out of that href tag (and fix the " to be quotations) you'll see two function calls that look like this:

GridView_ScrollToTop("Body_Main_Main_grpDevedores_gridView");
__doPostBack('ctl00$ctl00$ctl00$Body$Main$Main$grpDevedores$gridView','Page$5');

Now I didn't take the time to analyze these functions in depth, but you don't really need to. You see the first call causes the browser to scroll to the top, and the second call actually causes the next page of data to load on the page. For your purposes, you only care about the second call.

You can mess around with this in the browser; Just perform your search and then, in the JS console, paste in the JS call, exchanging the number for the page you want to look at.

If you can do it via JS in the console on the webpage, you can do it with Selenium. You would do something like this to "click" each tab:

for(i in range(1, 7)):
  js = "__doPostBack('ctl00$ctl00$ctl00$Body$Main$Main$grpDevedores$gridView','Page$" + str(i) + "');"
  driver.execute_script(js)
  #do scraping stuff

Collectives™ on Stack Overflow

Python Selenium - Scraping javascript pagination

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related