1

I would like to extract all data of the row named "Nb B" at this url page : https://www.coteur.com/cotes-foot.php

Here is my python script :

#!/usr/bin/python3
# -*- coding: utf­-8 ­-*-

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

options = Options()
options.headless = True
driver = webdriver.Chrome(options=options)

driver.get('https://www.coteur.com/cotes-foot.php')

#Store url associated with the soccer games
url_links = []
for i in driver.find_elements_by_xpath('//a[contains(@href, "match/cotes-")]'):
    url_links.append(i.get_attribute('href'))

print(len(url_links), '\n')

nb_bookies = []
for i in driver.find_elements_by_xpath('//td[contains(@class, " odds")][contains(@style, "")]'):
    nb_bookies.append(i.text)
    
print(nb_bookies) 

And here is the output :

25 

['1.80', '3.55', '4.70', '95%', '', '1.40', '4.60', '8.00', '94.33%', '', '2.35', '3.42', '2.63', '90.18%', '', '3.20', '3.60', '2.05', '92.19%', '', '7.00', '4.80', '1.35', '90.81%', '', '5.30', '4.30', '1.70', '99.05%', '', '2.15', '3.55', '3.65', '97.92%', '', '2.90', '3.20', '2.20', '88.81%', '', '3.95', '3.40', '2.10', '97.65%', '', '2.00', '3.80', '3.90', '98.04%', '', '2.40', '3.05', '3.50', '96.98%', '', '3.70', '3.20', '2.00', '91.72%', '', '2.75', '2.52', '3.05', '91.17%', '', '4.20', '3.05', '1.69', '84.23%', '', '1.22', '5.10', '10.00', '88.42%', '', '1.54', '4.60', '5.10', '93.72%', '', '3.00', '3.10', '2.45', '93.59%', '', '2.40', '3.50', '2.55', '90.55%', '', '1.76', '3.50', '4.20', '90.8%', '', '11.50', '5.30', '1.36', '98.91%', '', '3.00', '3.50', '2.20', '92.64%', '', '1.72', '3.42', '5.00', '92.62%', '', '1.08', '9.25', '19.00', '91.33%', '', '9.75', '5.75', '1.36', '98.82%', '', '5.70', '4.50', '1.63', '98.88%', '']

All the data of the table is extracted and you can see '' for the last row whereas I just want the last row.

3
  • 1
    i think you mean data of column Nb B ? Commented Jul 9, 2020 at 15:57
  • yes that is that Commented Jul 9, 2020 at 17:05
  • @ahmedaao do see my answer and let me know how you get on Commented Jul 9, 2020 at 20:15

2 Answers 2

1

To get the data from the last column only, fix your XPath accordingly :

nb_bookies = []
for i in driver.find_elements_by_xpath('//tr[@id and @role="row" ]/td[last()]'):
    nb_bookies.append(i.text)

Output :

['12', '12', '1', '9', '11', '12', '12', '12', '12', '12', '11', '2', '11', '11', '9', '12', '11', '12', '12', '12', '12', '12', '10', '5', '12']
Sign up to request clarification or add additional context in comments.

Comments

1

Your code is perfectly fine, the problem is to do with the window size that is spawned by the Automator in a headless mode. The default window size and display size in headless mode is 800x600 on all platforms.

The developers of the site have set the header to only appear if the width of the window is >1030px and only then the display: none; is removed from DOM. You can test this for yourself by shrinking & expanding the window size.

You need to understand that if an element's attribute contains style="display: none;" which means the element is hidden then Selenium won't be able to interact with the element, i.e. if a user can't see it then the same behavior applies to selenium.

Simply adding this line to enlarge your window in a headless mode will solve your problem.

options.add_argument("window-size=1400,800")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.