How can i navigate through pagination using selenium python?

Question

I am working on website automation and I want to navigate through different pages and problem is that the website is developed using Angular I think. The Pagination part is having a js function also which is called on an onClick function.

HTML Code is:

<li ng-if="directionLinks" ng-class="{ disabled : pagination.current == pagination.last }" class="ng-scope"><a href="" ng-click="setCurrent(pagination.current + 1)" class="xh-highlight">›</a></li>

Edited:

Website Link: https://jobee.pk/jobs-in-pakistan

Code Tried so far:

from selenium import webdriver
import time
class JobeePK:
    def __init__(self):
        # self.url = ""
        pass
    def driver(self):
        driver = webdriver.Chrome()
        driver.maximize_window()
        time.sleep(1)
        return driver

    # https://www.rozee.pk/job/jsearch/q/all/fc/1185/fpn/
    def extractData(self,search_link, total_pages):
        driver = self.driver()
        driver.get(search_link)
        time.sleep(5)

        for page_number in range(0, total_pages):
            driver.find_element_by_css_selector()
            time.sleep(10)



if __name__ == '__main__':
    jb = JobeePK()
    url = "https://jobee.pk/jobs-in-pakistan"
    total_pages = 128
    jb.extractData(url, total_pages)

Please suggest me any solution to tackle this problem. Thanks

I am using selenium library. I was trying to use css selector and xpath also to click the element. But element's position is changing as they are disabling continuous page counting. Page# is ranging from 1-128. So after reaching on page 4 the location of the element is changing. — GigaByte
– GigaByte, Commented Jun 21, 2019 at 18:25

Sebastien D · Accepted Answer · 2019-06-21 19:40:34Z

In such cases, it is always interesting to have a closer look to the page to understand how the data is actually updated.

I did so opening the console in Firefox and having a look at the XHR traffic network.

... interesting. The page is getting its results from an endpoint we could identify.

It returns json data which is great:

{'totalJobs': 2541,
 'jobs': [{'location': [{'jobLocationID': 0,
     'jobID': 24986,
     'countryID': 0,
     'country': 'Pakistan',
     'cityID': None,
     'cityText': 'Karachi',
     'jobShiftID': 0,
     'name': None}],
   'jobID': 24986,
   'jobIDEncrypted': '26cfb27ee6b2abad',
   'title': 'Marketing Officer - Freelancer',
   'jobDescription': '<p>We are growing, energetic, and highly-reputed Public Relation (PR) and Digital Marketing Agency.<br />\nCurrently, we are looking for ...

Lets use this to write our script:

import requests
import math

#The scrapping function
def getJobs(pageNumber):

    #Defining the headers
    headers = {
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:67.0) Gecko/20100101 Firefox/67.0',
        'Accept': 'application/json, text/plain, */*',
        'Accept-Language': 'fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3',
        'X-Requested-With': 'XMLHttpRequest',
        'Content-Type': 'application/json;charset=utf-8',
        'DNT': '1',
        'Connection': 'keep-alive',
        'Referer': 'https://jobee.pk/jobs-in-pakistan',
        'Pragma': 'no-cache'      
    }

    #Setting the right params for the request we will make, pageSize is set to 200 (results by page)
    data = {"model":{"titles":[],"cities":[],"shifts":[],"experinces":[],"careerLevels":[],"functionalAreas":[],"genders":[],"industries":[],"degreeLevels":[],"companies":[]},"pageNumber":1,"pageSize":200}

    #Updating the page number
    data['pageNumber'] = pageNumber
    data = json.dumps(data)

    #Collecting the results
    response = requests.post('https://jobee.pk/job/jobsearch', headers=headers, data=data)

    #Just in case an error shows up
    try:
        return json.loads(response.content)
    except:
        return {'jobs': []}

#Then lets get the page numbers from page 1        
data = getJobs(1)
totalJobs = data['totalJobs']
number_of_pages = math.ceil(totalJobs /200)

#Initializing our job list
jobs_list = []

#Looping through the pages
for pageNumber in range(1,number_of_pages + 1):
    results  = getJobs(pageNumber)

    #If no results we end the loop
    if len(result) == 0: 
        break
    else:
        #We append the results in the ['job'] key to append it to our list
        jobs_list += results['jobs']
        print ('Page', pageNumber,'-', len(jobs_list), "jobs collected")

#Lets have a look to the data into a dataframe
df = pd.DataFrame(jobs_list)
print(df)

Output

Page 1 - 200 jobs collected
Page 2 - 400 jobs collected
Page 3 - 600 jobs collected
...

+----+----------------------+--------------------+-------------+----------------------+------------------+----------------------------------------------------+--------+-------------------+----------------------------------------------------+----------------+--------------------+--------------------------+--------------------------+----------------+----------------------------------------------------+--------------------------------------------+----------------------------------------+-----------+
|    |    appliedByDate     |    companyName     | experience  |     expiredDate      | isSalaryVisible  |                  jobDescription                    | jobID  |  jobIDEncrypted   |                     location                       |     logo       | numberOfPositions  |        postDate          |       publishDate        |  salaryRange   |                      skills                        |                   title                    |     titleWithoutSpecialCharacters      | viewCount |
+----+----------------------+--------------------+-------------+----------------------+------------------+----------------------------------------------------+--------+-------------------+----------------------------------------------------+----------------+--------------------+--------------------------+--------------------------+----------------+----------------------------------------------------+--------------------------------------------+----------------------------------------+-----------+
| 0  | 0001-01-01T00:00:00  | Custom House       | Fresh       | 2019-09-19T00:00:00  | True             | <p>We require Mean Stack Developer Interns who...  | 27925  | a0962bea0bc174a1  | [{'jobLocationID': 0, 'jobID': 27925, 'country...  | 14564Logo.jpg  |                 3  | 2019-06-21T14:04:01.363  | 2019-06-21T19:26:24.213  | 5000 - 10000   | [AngularJs, Mongo DB, JavaScript, Node Js, Mea...  | Mean Stack Developer - Intern              | Mean-Stack-Developer-Intern            |        10 |
| 1  | 0001-01-01T00:00:00  | Custom House       | Fresh       | 2019-09-19T00:00:00  | True             | <p>We requires SEO, Digital Marketing and Grap...  | 27924  | 81e4e7f7d672dffd  | [{'jobLocationID': 0, 'jobID': 27924, 'country...  | 14564Logo.jpg  |                 2  | 2019-06-21T14:00:26.45   | 2019-06-21T19:25:04.493  | 5000 - 10000   | [Graphic Design, Search Engine Optimization (S...  | SEO Executive / Graphic Designer - Intern  | SEO-Executive-Graphic-Designer-Intern  |        10 |
| 2  | 0001-01-01T00:00:00  | Printoscan Lahore  | 1 Year      | 2019-09-19T00:00:00  | True             | <p>We require an <strong>Accounts Assistant / ...  | 27923  | 137a257e9e5bbb5d  | [{'jobLocationID': 0, 'jobID': 27923, 'country...  | None           |                 1  | 2019-06-21T13:59:37.373  | 2019-06-21T19:19:07.36   | 15000 - 20000  | [Accounts Services, Administrative Skills, Acc...  | Accounts Assistant / Administrator         | Accounts-Assistant-Administrator       |         6 |
+----+----------------------+--------------------+-------------+----------------------+------------------+----------------------------------------------------+--------+-------------------+----------------------------------------------------+----------------+--------------------+--------------------------+--------------------------+----------------+----------------------------------------------------+--------------------------------------------+----------------------------------------+-----------+

This is what we wanted.

You are a life saver. I am learning but i also tried this but updating page number was an issue for me so i left this and was using selenium.

Collectives™ on Stack Overflow

How can i navigate through pagination using selenium python?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related