2

So basically I wrote out this Python code using the Selenium library that could scrape out all 239 rows from a table on a website. I was able to successfully scrape the first 4 columns using the XPath selector but while trying to scrape for the last four columns it kept on returning empty values (" ") with the elements still being present on the website.

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
#from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import pandas as pd

url = 'https://www.adducation.info/general-knowledge-travel-and-transport/emergency-numbers/'
path= 'xxxxxxxxxx'


service=Service(executable_path=path)
driver=webdriver.Chrome(service=service)
driver.get(url)
driver.implicitly_wait(20)
time.sleep(10)

containers = driver.find_elements(by='xpath', value='//tr')

Country = []
Emergency = []
Police = []
Ambulance = []
Fire = []
Group = []
Calling_codes = []
Local_emergency_no = []


P=range(1,240)
for i,j in zip(containers,P):
    try:
        A = i.find_element(by='xpath',value=f'//tr[{j}]/td[1]/strong').text  
        B = i.find_element(by='xpath',value=f'//tr[{j}]/td[2]').text
        C = i.find_element(by='xpath',value=f'//tr[{j}]/td[3]').text
        D = i.find_element(by='xpath',value=f'//tr[{j}]/td[4]').text
        E = i.find_element(by='xpath',value=f'//tr[{j}]/td[5]').text
        F = i.find_element(by='xpath',value=f'//tr[{j}]/td[6]').text
        G = i.find_element(by='xpath',value=f'//tr[{j}]/td[7]').text
        H = i.find_element(by='xpath',value=f'//tr[{j}]/td[8]').text
    
    except:
        A = i.find_element(by='xpath',value=f'//tr[{j}]/td[1]/em/strong').text
        B = i.find_element(by='xpath',value=f'//tr[{j}]/td[2]').text
        C = i.find_element(by='xpath',value=f'//tr[{j}]/td[3]').text
        D = i.find_element(by='xpath',value=f'//tr[{j}]/td[4]').text
        E = i.find_element(by='xpath',value=f'//tr[{j}]/td[5]').text
        F = i.find_element(by='xpath',value=f'//tr[{j}]/td[6]').text
        G = i.find_element(by='xpath',value=f'//tr[{j}]/td[7]').text
        H = i.find_element(by='xpath',value=f'//tr[{j}]/td[8]').text
    

    finally:
        Country.append(A)
        Emergency.append(B)
        Police.append(C)
        Ambulance.append(D)
        Fire.append(E)
        Group.append(F)
        Calling_codes.append(G)
        Local_emergency_no.append(H)

dict_={'Country' : Country,
    'Emergency' : Emergency, 
    'Police' : Police, 
    'Ambulance' : Ambulance, 
    'Fire' : Fire, 
    'Continent' : Group, 
    'Calling_codes' : Calling_codes,
    'Local_emergency_no' : Local_emergency_no
    }

Emergency_DS = pd.DataFrame(dict_)
print(Emergency_DS)

3 Answers 3

2

To scrape data from the Emergency Numbers List table from the website ≡ Emergency Numbers List: 911, 112 & 999 Numbers Worldwide you need to induce WebDriverWait for the visibility_of_element_located() for the <table> element and using DataFrame from Pandas you can use the following locator strategy:

Code Block:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd

options = Options()
options.add_argument("start-maximized")
driver = webdriver.Chrome(options=options)
driver.get(url='https://www.adducation.info/general-knowledge-travel-and-transport/emergency-numbers/')
table_data = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table.dataList.footable-loaded.footable.default"))).get_attribute("outerHTML")
df = pd.read_html(table_data)
print(df)
driver.quit()

Console Output:

[    Country / Territory ☎ Emergency  ... Calling codes                     Local emergency numbers & info
0        🇦🇫 Afghanistan         NaN  ...           +93  You can dial 020 112 from mobile but only   in K...
1            🇦🇱 Albania         NaN  ...          +355                                                  NaN
2            🇩🇿 Algeria         NaN  ...          +213                      Dial 1548 for tourist   police.
3     🇦🇸 American Samoa         911  ...        +1 684                                                  NaN
4            🇦🇩 Andorra         112  ...          +376                                                  NaN
..                  ...         ...  ...           ...                                                ...
234  🇼🇫 Wallis & Futuna         NaN  ...          +681                                                  NaN
235   🇪🇭 Western Sahara         150  ...          +212            This disputed state is part of M  orocco.
236            🇾🇪 Yemen         NaN  ...          +967                                                  NaN
237           🇿🇲 Zambia         112  ...          +260                                                  NaN
238         🇿🇼 Zimbabwe         999  ...          +264                                                  NaN

[239 rows x 8 columns]]

References

You can find a couple of relevant detailed discussions in:

Sign up to request clarification or add additional context in comments.

Comments

1

You do not need Selenium (a testing framework, vastly misused for web scraping purposes) to obtain that data. Here is another way:

import pandas as pd
df = pd.read_html('https://www.adducation.info/general-knowledge-travel-and-transport/emergency-numbers/')[0]
print(df)

Result in terminal:

    Country / Territory     ☎ Emergency     ☎ Police    ☎ Ambulance     ☎ Fire  Group   Calling codes   Local emergency numbers & info
0   🇦🇫 Afghanistan    NaN     119     119, 102    112, 119    Asia    +93     You can dial 020 112 from mobile but only in Kabul.
1   🇦🇱 Albania    NaN     129     127     128     Europe  +355    NaN
2   🇩🇿 Algeria    NaN     17  14  14  Africa  +213    Dial 1548 for tourist police.
3   🇦🇸 American Samoa     911     NaN     NaN     NaN     Oceania     +1 684  NaN
4   🇦🇩 Andorra    112     110     118     118     Europe  +376    NaN
...     ...     ...     ...     ...     ...     ...     ...     ...
234     🇼🇫 Wallis & Futuna    NaN     18  15  17  French, Oceania     +681    NaN
235     🇪🇭 Western Sahara     150     NaN     NaN     NaN     Africa  +212    This disputed state is part of Morocco.
236     🇾🇪 Yemen  NaN     194     191     191     Asia    +967    NaN
237     🇿🇲 Zambia     112     999     993     991     Africa  +260    NaN
238     🇿🇼 Zimbabwe   999     995     994     993     Africa  +264    NaN

239 rows × 8 columns

You can also save that dataframe as a .csv document, if you want -- see pandas documentation for more information.

4 Comments

Yeah thank you for this. I have a knowledge on this already. Basically, I’m working on using selenium to web scrape for a personal project. So i was hoping i could find a way around this.
This answer addresses the issue in your question, namely getting the respective data @MubaraqOnipede
I just got an answer using selenium. I included the driver.maximize_window() to access the whole window.
And again, please why do feel selenium is vastly misused for web scraping purpose as against it being used as a testing framework
1

If you still want to use Selenium for this, You just have to add this line at the beginning of the script :

driver.maximize_window()

else the table doesn't appear in full. Moreover, you can decrease your implicity wait, and you have to handle cookies popup if you don't do it yourself.

1 Comment

Thank youuuuu. You are a lifesaver!!!!!!. I just figured it out now. With maximizing the window I was able to scrape it all. Thankssss

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.