0

I am working on website automation and I want to navigate through different pages and problem is that the website is developed using Angular I think. The Pagination part is having a js function also which is called on an onClick function.

HTML Code is:

<li ng-if="directionLinks" ng-class="{ disabled : pagination.current == pagination.last }" class="ng-scope"><a href="" ng-click="setCurrent(pagination.current + 1)" class="xh-highlight">›</a></li>

Edited:

Website Link: https://jobee.pk/jobs-in-pakistan

Code Tried so far:

from selenium import webdriver
import time
class JobeePK:
    def __init__(self):
        # self.url = ""
        pass
    def driver(self):
        driver = webdriver.Chrome()
        driver.maximize_window()
        time.sleep(1)
        return driver

    # https://www.rozee.pk/job/jsearch/q/all/fc/1185/fpn/
    def extractData(self,search_link, total_pages):
        driver = self.driver()
        driver.get(search_link)
        time.sleep(5)

        for page_number in range(0, total_pages):
            driver.find_element_by_css_selector()
            time.sleep(10)



if __name__ == '__main__':
    jb = JobeePK()
    url = "https://jobee.pk/jobs-in-pakistan"
    total_pages = 128
    jb.extractData(url, total_pages)

Please suggest me any solution to tackle this problem. Thanks

4
  • Consider using selenium library Commented Jun 21, 2019 at 18:22
  • 1
    I am using selenium library. I was trying to use css selector and xpath also to click the element. But element's position is changing as they are disabling continuous page counting. Page# is ranging from 1-128. So after reaching on page 4 the location of the element is changing. Commented Jun 21, 2019 at 18:25
  • Please post some code Commented Jun 21, 2019 at 18:26
  • have edited the post can you please take a look sir. Commented Jun 21, 2019 at 18:57

1 Answer 1

1

In such cases, it is always interesting to have a closer look to the page to understand how the data is actually updated.

I did so opening the console in Firefox and having a look at the XHR traffic network.

enter image description here

... interesting. The page is getting its results from an endpoint we could identify.

It returns json data which is great:

{'totalJobs': 2541,
 'jobs': [{'location': [{'jobLocationID': 0,
     'jobID': 24986,
     'countryID': 0,
     'country': 'Pakistan',
     'cityID': None,
     'cityText': 'Karachi',
     'jobShiftID': 0,
     'name': None}],
   'jobID': 24986,
   'jobIDEncrypted': '26cfb27ee6b2abad',
   'title': 'Marketing Officer - Freelancer',
   'jobDescription': '<p>We are growing, energetic, and highly-reputed Public Relation (PR) and Digital Marketing Agency.<br />\nCurrently, we are looking for ...

Lets use this to write our script:

import requests
import math

#The scrapping function
def getJobs(pageNumber):

    #Defining the headers
    headers = {
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:67.0) Gecko/20100101 Firefox/67.0',
        'Accept': 'application/json, text/plain, */*',
        'Accept-Language': 'fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3',
        'X-Requested-With': 'XMLHttpRequest',
        'Content-Type': 'application/json;charset=utf-8',
        'DNT': '1',
        'Connection': 'keep-alive',
        'Referer': 'https://jobee.pk/jobs-in-pakistan',
        'Pragma': 'no-cache'      
    }

    #Setting the right params for the request we will make, pageSize is set to 200 (results by page)
    data = {"model":{"titles":[],"cities":[],"shifts":[],"experinces":[],"careerLevels":[],"functionalAreas":[],"genders":[],"industries":[],"degreeLevels":[],"companies":[]},"pageNumber":1,"pageSize":200}

    #Updating the page number
    data['pageNumber'] = pageNumber
    data = json.dumps(data)

    #Collecting the results
    response = requests.post('https://jobee.pk/job/jobsearch', headers=headers, data=data)

    #Just in case an error shows up
    try:
        return json.loads(response.content)
    except:
        return {'jobs': []}

#Then lets get the page numbers from page 1        
data = getJobs(1)
totalJobs = data['totalJobs']
number_of_pages = math.ceil(totalJobs /200)

#Initializing our job list
jobs_list = []

#Looping through the pages
for pageNumber in range(1,number_of_pages + 1):
    results  = getJobs(pageNumber)

    #If no results we end the loop
    if len(result) == 0: 
        break
    else:
        #We append the results in the ['job'] key to append it to our list
        jobs_list += results['jobs']
        print ('Page', pageNumber,'-', len(jobs_list), "jobs collected")

#Lets have a look to the data into a dataframe
df = pd.DataFrame(jobs_list)
print(df)

Output

Page 1 - 200 jobs collected
Page 2 - 400 jobs collected
Page 3 - 600 jobs collected
...

+----+----------------------+--------------------+-------------+----------------------+------------------+----------------------------------------------------+--------+-------------------+----------------------------------------------------+----------------+--------------------+--------------------------+--------------------------+----------------+----------------------------------------------------+--------------------------------------------+----------------------------------------+-----------+
|    |    appliedByDate     |    companyName     | experience  |     expiredDate      | isSalaryVisible  |                  jobDescription                    | jobID  |  jobIDEncrypted   |                     location                       |     logo       | numberOfPositions  |        postDate          |       publishDate        |  salaryRange   |                      skills                        |                   title                    |     titleWithoutSpecialCharacters      | viewCount |
+----+----------------------+--------------------+-------------+----------------------+------------------+----------------------------------------------------+--------+-------------------+----------------------------------------------------+----------------+--------------------+--------------------------+--------------------------+----------------+----------------------------------------------------+--------------------------------------------+----------------------------------------+-----------+
| 0  | 0001-01-01T00:00:00  | Custom House       | Fresh       | 2019-09-19T00:00:00  | True             | <p>We require Mean Stack Developer Interns who...  | 27925  | a0962bea0bc174a1  | [{'jobLocationID': 0, 'jobID': 27925, 'country...  | 14564Logo.jpg  |                 3  | 2019-06-21T14:04:01.363  | 2019-06-21T19:26:24.213  | 5000 - 10000   | [AngularJs, Mongo DB, JavaScript, Node Js, Mea...  | Mean Stack Developer - Intern              | Mean-Stack-Developer-Intern            |        10 |
| 1  | 0001-01-01T00:00:00  | Custom House       | Fresh       | 2019-09-19T00:00:00  | True             | <p>We requires SEO, Digital Marketing and Grap...  | 27924  | 81e4e7f7d672dffd  | [{'jobLocationID': 0, 'jobID': 27924, 'country...  | 14564Logo.jpg  |                 2  | 2019-06-21T14:00:26.45   | 2019-06-21T19:25:04.493  | 5000 - 10000   | [Graphic Design, Search Engine Optimization (S...  | SEO Executive / Graphic Designer - Intern  | SEO-Executive-Graphic-Designer-Intern  |        10 |
| 2  | 0001-01-01T00:00:00  | Printoscan Lahore  | 1 Year      | 2019-09-19T00:00:00  | True             | <p>We require an <strong>Accounts Assistant / ...  | 27923  | 137a257e9e5bbb5d  | [{'jobLocationID': 0, 'jobID': 27923, 'country...  | None           |                 1  | 2019-06-21T13:59:37.373  | 2019-06-21T19:19:07.36   | 15000 - 20000  | [Accounts Services, Administrative Skills, Acc...  | Accounts Assistant / Administrator         | Accounts-Assistant-Administrator       |         6 |
+----+----------------------+--------------------+-------------+----------------------+------------------+----------------------------------------------------+--------+-------------------+----------------------------------------------------+----------------+--------------------+--------------------------+--------------------------+----------------+----------------------------------------------------+--------------------------------------------+----------------------------------------+-----------+

This is what we wanted.

Sign up to request clarification or add additional context in comments.

1 Comment

You are a life saver. I am learning but i also tried this but updating page number was an issue for me so i left this and was using selenium.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.