1

I'm trying to scrape the contents of a table. I believe the table is rendered in JavaScript, so I'm using the selenium package and Python3. To do such a task, I've seen others find the tables xpath in order to scrape its contents, but I'm just not sure how to identify the correct xpath.

How can I extract the tables contents? If using a xpath, how do I identify the correct xpath(s) corresponding to the table or its contents by inspecting the web page's source?

from selenium import webdriver                                                                                                                                                                                                                                              
driver = webdriver.Chrome('path/to/chromedriver.exe')                                      
url = https://ultrasignup.com/results_event.aspx?did=6727
driver.get(url)

# Now I need to get the tables contents. I might do something like this:
table = driver.find_elements_by_xpath('my_xpath')
table_html = table.get_attribute('innerHTML') # not sure what innerHTML is...
df = read_html(table_html)[0]
print(df)
driver.close()     

3
  • 1
    I believe there is no need to scrape, because they have an API. If you visit this link you will see nicely formatted data from the table you provided: ultrasignup.com/service/events.svc/results/6727/json?rows=1500 Commented Jun 23, 2019 at 18:28
  • The page-under-test has many page elements with id attributes. Locating via id will be less fragile; YMMV. Commented Jun 23, 2019 at 18:33
  • @andreilozhkin you began to post some code that looked helpful, but then removed it. I could accept your answer if you put it back up! Commented Jun 23, 2019 at 19:27

2 Answers 2

1

I believe there is no need to scrape, because they have an API.

If you visit this link you will see nicely formatted data from the table you provided: https://ultrasignup.com/service/events.svc/results/6727/json

Some code:

import json, requests

url = 'https://ultrasignup.com/service/events.svc/results/6727/json'

response = requests.get(url)

# Get all people from the table
people = [x for x in response.json()] 

# Print first person's information
print(people[0]) 

Hope it helps!

Sign up to request clarification or add additional context in comments.

Comments

0

You can identify the correct xpath by inspecting the elements of the table and seeing the source code. After you see in which tags is the table content present you have to make your xpath step-wise.

For example:


<div class="test">
<p class="test2">
<table class="test3"> 
<!--May have more attributes-->
contents...
</table>
</p>
</div>

Then you begin your xpath with //div[@class="test"] Now you are inside div,

Next step: //div[@class="test"]//p[@class="test2"] Now you are inside paragraph tag

Final Step:

xpath = "//div[@class='test']//p[@class='test2']//table[@class='test3']"

table = driver.find_elements_by_xpath('xpath')

Now you can access table and get whatever attributes you want or even the table contents

1 Comment

Thanks YOGOVO, this begins to help me better understand the structure of the html source code. Would you be able to identify examples xpaths based on the webpage example I provided? I am still struggling to identify the correct tags from the source code.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.