1

This question has been asked before numerous times but I have tried all of the solutions I can find with no success. In short, I am scraping a table of members and can successfully collect all columns but the last which includes a button with a hyperlink to the member's email address. The hyperlink does not appear to be hidden as one can see the email when the cursor hovers over the button however I cannot select the button element and print out the hyperlink. Below is the XPATH to the first email address of the table (column 5)

/html/body/div[5]/div[1]/main/div/div[5]/div/div/div/table/tbody/tr[1]/td[5]/a

Below is the element for this same first email address of the table

<a href="mailto:[email protected]"><span id="ember2071" class="ember-view aia-icon"><svg class="icon" version="1.1" id="Layer_1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px" viewBox="0 0 40 40" style="enable-background:new 0 0 40 40;" xml:space="preserve">
<path class="st0" d="M5.5,8.3v23.5h30.8V8.3H5.5z M8.6,26.4V13.6l6.3,6.4L8.6,26.4z M21.5,21.1c-0.2,0.3-0.9,0.3-1.2,0l-9.6-9.7
    h20.4L21.5,21.1z M18.1,23.3c0.7,0.7,1.7,1.1,2.8,1.1c1.1,0,2.1-0.4,2.8-1.1l1-1.1l6.3,6.4H10.7l6.3-6.5L18.1,23.3z M26.9,20
    l6.2-6.3v12.7L26.9,20z"></path>
</svg>
</span></a>

Below is the code for my script for pulling the email addresses. Finally, I would like the script to output the email addresses into a CSV in a separate column from the other columns but that is for a separate discussion.

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By

# open chrome
# driver = Webdriver.chrome("C:\Python Tools\chromedriver.exe")
s = Service("C:\Python Tools\chromedriver.exe")
driver = webdriver.Chrome(service=s)

# navigate to site and sign-in
driver.get("https://account.aia.org/signin?redirectUrl=https:%2F%2Fwww.aia.org%2F")
driver.implicitly_wait(10)
driver.get("https://account.aia.org/signin?redirectUrl=https:%2F%2Fwww.aia.org%2F")
username = driver.find_element(By.ID, "mat-input-0")
password = driver.find_element(By.ID, "mat-input-1")
username.send_keys("[email protected]")
password.send_keys("Test1234!")
driver.find_element(By.CLASS_NAME, "mat-button-wrapper").click()
driver.implicitly_wait(10)

# close cookies box
driver.find_element(By.XPATH, '//*[@id="truste-consent-button"]').click()

# navigate go member directory
driver.implicitly_wait(10)
driver.get("https://www.aia.org/member-directory?page%5Bnumber%5D=1")
driver.implicitly_wait(10)
# extract email addresses: list of tried and failed find element queries
# v1 = driver.find_elements(By.XPATH, "//button[contains(text(),'mailto')]")
# v1 = driver.find_elements(By.XPATH,'//a[contains(@href,".com")]')
# v1 = driver.find_elements(By.PARTIAL_LINK_TEXT, ".com")
# v1 = driver.find_elements(By.XPATH, '//a[contains(@href,"href")]')
# v1 = driver.find_elements(By.XPATH, '//a[@href="'+url+'"]')
# v1 = driver.find_elements(By.XPATH, "//a[contains(text(),'Verify Email')]").getAttribute('href')
# v1 = driver.find_elements(By.CLASS_NAME, "ember-view aia-icon").get_attribute("href")
# v1 = driver.find_elements(By.TAG_NAME, "a").getAttribute("href")
# v1 = driver.find_elements(By.XPATH,("//input[contains(td[5])]")).getAttribute("href")
# v1 = driver.find_elements(By.cssSelector("mailto").getAttribute("href")
# v1 = driver.find_elements(By.CLASS_NAME, "data-table").getAttribute("href")
# v1 = driver.find_elements(By.XPATH, "//div[@id='testId']/a").getAttribute("href")
# v1 = driver.find_elements(By.cssSelector("mailto")
# v1 = driver.find_elements(By.TAG_NAME, "td[5]")
# v1 = driver.find_elements(By.XPATH,("//input[contains(td[5])]"))
# v1 = driver.find_elements(By.TAG_NAME, "a")
# v1 = driver.find_elements(By.CLASS_NAME, "ember-view aia-icon")
print(v1)
# export email addresses to CSV
import csv

with open('AIAMemberSearch.csv', 'w', newline='') as file:
    writer = csv.writer(file, quoting=csv.QUOTE_ALL,delimiter=';')
    writer.writerows(v1)

Secondly, I would like to collect data from all five columns of the table and export to CSV, running a loop across all pages of the member directory. My draft code is below

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By

# open chrome
# driver = Webdriver.chrome("C:\Python Tools\chromedriver.exe")
s = Service("C:\Python Tools\chromedriver.exe")
driver = webdriver.Chrome(service=s)

# navigate to site
driver.get("https://account.aia.org/signin?redirectUrl=https:%2F%2Fwww.aia.org%2F")
driver.implicitly_wait(10)

# Enter login
driver.get("https://account.aia.org/signin?redirectUrl=https:%2F%2Fwww.aia.org%2F")
username = driver.find_element(By.ID, "mat-input-0")
password = driver.find_element(By.ID, "mat-input-1")
username.send_keys("[email protected]")
password.send_keys("Test1234!")
driver.find_element(By.CLASS_NAME, "mat-button-wrapper").click()
driver.implicitly_wait(10)

# close cookies box
# old way driver.find_element_by_xpath('//*[@id="truste-consent-button"]').click()
driver.find_element(By.XPATH, '//*[@id="truste-consent-button"]').click()


driver.implicitly_wait(10)

# for holding the resultant list 
element_list = []
for page in range(1, 3, 1):

    page_url = "https://www.aia.org/member-directory?page%5Bnumber%5D=" + str(page)
    driver.get(page_url)
    driver.implicitly_wait(10)
    # collect name, chapter, firm and location columns (not working, needs a loop)
    v1 = driver.find_elements(By.CLASS_NAME, "data-table")
    # collect email addresses (working)
    v2 = driver.find_elements(By.XPATH, '//a [contains(@href,"mailto")][@href]')
    for i in v2:
        email = i.get_attribute("href")
# loop across pages of directory        
    for i in range(len(v1)):
        element_list.append([v1[i].text, v2[i].text])
# export to csv
import csv
with open('AIAMemberSearch.csv', 'w', newline='') as file:
    writer = csv.writer(file, quoting=csv.QUOTE_ALL,delimiter=';')
    writer.writerows(element_list)
1
  • FYI, credentials in the above script are purely for this example, and not personal. Commented Feb 6, 2022 at 18:56

2 Answers 2

2

1st DONT SHARE YOUR CREDENTIALS AT ALL

2nd Share AS Much HTML as possible without sharing credentials

and this should work I tested that

v1 = driver.find_elements(By.XPATH, '//a [contains(@href,"mailto")][@href]')
for i in v1:
    email = i.get_attribute("href")
    print (email)

Sign up to request clarification or add additional context in comments.

1 Comment

and to get only the email apply regex
0

you should be using .get_attribute('href') instead of .getAttribute()

Therefore you can get all of the emails like this:

for item in driver.find_element(By.CLASS_NAME, 'data-table').find_elements(By.TAG_NAME, 'tr')[1:]:
    try:
        v1 = item.find_element(By.TAG_NAME, 'a').get_attribute("href")
    except:
        continue # element doesn't have an email
    print(v1)

To get the rest of the table information you can do something like this:

data_table = []
for item in driver.find_element(By.CLASS_NAME, 'data-table').find_elements(By.TAG_NAME, 'tr')[1:]:
    elements = item.find_elements(By.TAG_NAME, 'td')
    name = elements[0].text
    aia_branch = elements[1].text
    company = elements[2].text
    location = elements[3].text
    data_table.append([name, aia_branch, company, location])
    try:
        v1 = item.find_element(By.TAG_NAME, 'a').get_attribute("href")
    except:
        continue # element doesn't have an email
    data_table[-1].append(v1)
    print(name, aia_branch, company, location, v1)

Now to save all of that data to a csv you can do this:

with open('AIAMemberSearch.csv', 'w', newline='') as file:
    writer = csv.writer(file, delimiter=',')
    writer.writerows(data_table)

5 Comments

Thank you Andrew. I really like you setup this query and it works perfectly. How would I revise the above loop to pull just column 1 of the data-table? Ultimately I want to be able to export each column to CSV and have all the rows properly indexed.
@Juzek2000 This is only getting the emails currently (1 column). what are the other columns of data that you would like to get? Do you just want to convert the table to a csv structure?
Yes, I just want to convert the entire table to CSV and include the email addresses with separate rows for each member. I can run driver.find_elements(By.CLASS_NAME, 'data-table') to get the whole table except for email addresses although my export to csv pastes the entire table into a single csv cell.
I updated my original post with my second issue as described above. Thank you all for your help!
Again, thank you for the excellent work!! This works magnificently!!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.