1

I've been building this scraper (with some massive help from users here) to get data on some companies' debt with the public sector and I've been able to get to the site, input the desired search parameters and scrape the first 50 results (out of 300). The problem I've encountered is that this page's pagination has the following characteristics:

  1. It does not possess a next page button
  2. The URL doesn't change with the pagination
  3. The pagination is done with a Javascript script

Here's the code so far:

path_driver = "C:/Users/CS330584/Documents/Documentos de Defesa da Concorrência/Automatização de Processos/chromedriver.exe"
website = "https://sat.sef.sc.gov.br/tax.NET/Sat.Dva.Web/ConsultaPublicaDevedores.aspx"
value_search = "300"
final_table = []


driver = webdriver.Chrome(path_driver)
driver.get(website)
search_max = driver.find_element_by_id("Body_Main_Main_ctl00_txtTotalDevedores")
search_max.send_keys(value_search)
btn_consult = driver.find_element_by_id("Body_Main_Main_ctl00_btnBuscar")
btn_consult.click()

driver.implicitly_wait(10)

cnpjs = driver.find_elements_by_xpath("//*[@id='Body_Main_Main_grpDevedores_gridView']/tbody/tr/td[1]")
empresas = driver.find_elements_by_xpath("//*[@id='Body_Main_Main_grpDevedores_gridView']/tbody/tr/td[2]")
dividas = driver.find_elements_by_xpath("//*[@id='Body_Main_Main_grpDevedores_gridView']/tbody/tr/td[3]")
for i in range(len(empresas)):
    temp_data = {'CNPJ' : cnpjs[i].text,
               'Empresas' : empresas[i].text,
                'Divida' : dividas[i].text
                }
    final_table.append(temp_data)

How can I navigate through the pages in order to scrape their data ? Thank you all for the help!

2
  • Have you checked to see if the government offers an API where you can fetch this directly, instead of hacking it with scraping? Commented Apr 27, 2022 at 19:21
  • The benefit of Selenium is that you can inject keystrokes identical to what you would type as a human. Whatever you do as a human to trigger the next page, you can inject with Selenium. Commented Apr 27, 2022 at 19:23

1 Answer 1

1

If you inspect the page and look at what happens when you click on the next page button, you'll see in the tag they're actually executing some javascript. It looks like this:

<a href="javascript:GridView_ScrollToTop(&quot;Body_Main_Main_grpDevedores_gridView&quot;);__doPostBack('ctl00$ctl00$ctl00$Body$Main$Main$grpDevedores$gridView','Page$5')"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">6</font></font></a>

But if you take that javascript call out of that href tag (and fix the " to be quotations) you'll see two function calls that look like this:

GridView_ScrollToTop("Body_Main_Main_grpDevedores_gridView");
__doPostBack('ctl00$ctl00$ctl00$Body$Main$Main$grpDevedores$gridView','Page$5');

Now I didn't take the time to analyze these functions in depth, but you don't really need to. You see the first call causes the browser to scroll to the top, and the second call actually causes the next page of data to load on the page. For your purposes, you only care about the second call.

You can mess around with this in the browser; Just perform your search and then, in the JS console, paste in the JS call, exchanging the number for the page you want to look at.

If you can do it via JS in the console on the webpage, you can do it with Selenium. You would do something like this to "click" each tab:

for(i in range(1, 7)):
  js = "__doPostBack('ctl00$ctl00$ctl00$Body$Main$Main$grpDevedores$gridView','Page$" + str(i) + "');"
  driver.execute_script(js)
  #do scraping stuff
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.