2

I would like to scrape the table that appears when you go to this website: https://www.eprocure.gov.bd/resources/common/SearcheCMS.jsp

I used the following code based on the example shown here.


options = Options()
options.add_argument('--headless')

driver = webdriver.Firefox(executable_path="C:/Users/DefaultUser/AppData/geckodriver.exe")
driver.get("https://www.eprocure.gov.bd/resources/common/SearcheCMS.jsp")
time.sleep(5)
res = driver.execute_script("return document.documentElement.outerHTML")
driver.quit()

soup = BeautifulSoup(res, 'html.parser')
table_rows =soup.find_all('table')\[1\].find_all('tr')
rows=\[\]
for tr in table_rows:
td = tr.find_all('td')
rows.append(\[i.text for i in td\])
delaydata = rows\[3:\]
import pandas as pd
df = pd.DataFrame(delaydata, columns = \['S. No.',  'Ministry, Division, Organization PE',  'Procurement Nature, Type & Method',    'Tender/Proposal ID, Ref No., Title & Publishing Date', 'Contract Awarded To',  'Company Unique ID',    'Experience Certificate No',    'Contract Amount',  'Contract Start & End Date',    'Work Status'\])
df
1
  • I understand that the table is created using javascript. How do I select the table, in this case? Commented Dec 29, 2022 at 5:22

1 Answer 1

3

Finding the URL

Well, actually, there's no need to use Selenium. The data is available via sending a POST request to:

https://www.eprocure.gov.bd/AdvSearcheCMSServlet
  • How did I find this URL?

Well, if you inspect your browsers Network calls (Click on F12), you'll see the following:

enter image description here

And take note of the "Payload" tab:

enter image description here

this will later be used as data in the below example.

Great, but how do I get the data including paginating the page?

To get the data, including page pagination, you can see this example, where we get the HTML table and increase pageNo for pagination (this is for the "eTenders" table/tab):

import requests
import pandas as pd
from bs4 import BeautifulSoup


data = {
    "action": "geteCMSList",
    "keyword": "",
    "officeId": "0",
    "contractAwardTo": "",
    "contractStartDtFrom": "",
    "contractStartDtTo": "",
    "contractEndDtFrom": "",
    "contractEndDtTo": "",
    "departmentId": "",
    "tenderId": "",
    "procurementMethod": "",
    "procurementNature": "",
    "contAwrdSearchOpt": "Contains",
    "exCertSearchOpt": "Contains",
    "exCertificateNo": "",
    "tendererId": "",
    "procType": "",
    "statusTab": "eTenders",
    "pageNo": "1",
    "size": "10",
    "workStatus": "All",
}


_columns = [
    "S. No",
    "Ministry, Division, Organization, PE",
    "Procurement Nature, Type & Method",
    "Tender/Proposal ID, Ref No., Title..",
    "Contract Awarded To",
    "Company Unique ID",
    "Experience Certificate No  ",
    "Contract Amount",
    "Contract Start & End Date",
    "Work Status",
]

for page in range(1, 11):  # <--- Increase number of pages here
    print(f"Page: {page}")
    data["pageNo"] = page


    response = requests.post(
        "https://www.eprocure.gov.bd/AdvSearcheCMSServlet", data=data
    )
    # The HTML is missing a `table` tag, so we need to fix it
    soup = BeautifulSoup("<table>" + "".join(response.text) + "</table>", "html.parser")
    df = pd.read_html(
        str(soup),
    )[0]

    df.columns = _columns
    print(df.to_string())

Going further

How do I select the different tabs/tables on the page?

To select the different tabs on the page, you can change the "statusTab" in the data. Inspect the payload tab again, and you'll see what I mean.

Output

The above code outputs:

   S. No                                                                              Ministry, Division, Organization, PE Procurement Nature, Type & Method                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Tender/Proposal ID, Ref No., Title..             Contract Awarded To  Company Unique ID                                                    Experience Certificate No\t  Contract Amount Contract Start & End Date Work Status
0      1  Ministry of Education, Education Engineering Department, Office of the Executive Engineer, EED,Kishoreganj Zone.                   Works, NCT, LTM                                                                                                                                                                                                                                                                                                                                                                  300580, 932/EE/EED/KZ/Rev-5974/2018-19/23, Dt: 28/03/2019 Repair and Renovation Works at Chowganga Shahid Smrity High School Itna Kishoreganj. 01-Apr-2019   M/S KAZI RASEL NIRMAN SONGSTA            1051854                                       WD-5974- 25/e-GP/20221228/300580/0060000       475000.000   10-Jun-2019 03-Sep-2019   Completed
1      2            Ministry Of Water Resourses, Bangladesh Water Development Board (BWDB), Chattogram Mechanical Division                   Works, NCT, LTM                       558656, CMD/T-19/100 Dated: 14-03-2021 Manufacturing supplying & installation of 01 No MS Flap gate size - 1.65 m 1.95m and 01 no. Padestal type lifting device for sluice no S-15 6-vent 02 nos MS Vertical gate size - 1.65 m 1.95m for sluice no S-15 6-vent and sluice no S-14 new 1-vent at Coxs Bazar Sadar Upazilla of CEP Polder No 66/1 under Coxsbazar O&M Division implemented by Chattogram Mechanical Division BWDB Madunaghat Chattogram during the financial year 2020-21. 15-Mar-2021             M/S. AN Corporation            1063426                            CMD/COX/LTM-16/2020-21/e-GP/20221228/558656/0059991       503470.662   12-Apr-2021 05-May-2021   Completed
2      3            Ministry Of Water Resourses, Bangladesh Water Development Board (BWDB), Chattogram Mechanical Division                   Works, NCT, LTM                                                                633496, CMD/T-19/263 Dated: 30-11-2021 Manufacturing, supplying & installation of 07 No M.S Flap gate for sluice no.- 6 (1-vent), sluice no.- 7 (2-vent), sluice no.-8 (2-vent), sluice no.-35 (2-vent) size :- (1.00 m Ã?1.00m), 01 No Padestal type lifting device for sluice no- 13(1-vent) for CEP Polder No 64/2B, at pekua Upazilla under Chattogram Mechanical Division, BWDB, Madunaghat, Chattogram, during the financial year 2021-22. 30-Nov-2021             M/S. AN Corporation            1063426                                CMD/LTM-08/2021-22/e-GP/20221228/633496/0059989       648808.272   26-Dec-2021 31-Jan-2022   Completed
...
...
Sign up to request clarification or add additional context in comments.

4 Comments

Hi, MendelG! Thank you for your grand explanation! I figured the first after I submitted the post! Thank you so much for the second part! I believe it would work, but unfortunately, I have been dealing with another issue now. I use Jupyter and it causes an error HTTPSConnectionPool(host='www.eprocure.gov.bd', port=443): Max retries exceeded with url: /AdvSearcheCMSServlet (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129)')))
@Nazmul That's a completely separate question for which I'm not sure. Consider marking this answer as accepted and ask a new question on StackOverflow.
I solved the earlier issue. Your code worked! However, I do want to turn the paginated table into a single dataframe for data analysis. But the dataframe only contains the 10 rows of the last page.
@Nazmul Hmm, I'm not too strong with pandas but I think you can create a list outside of the loop and append to that. If you have further questions, consider asking it as a new question as not to clutter up the comment section

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.