0

I would like to extract all the league names (e.g. England Premier League, Scotland Premiership, etc.) from this website https://mobile.bet365.com/#type=Splash;key=1;ip=0;lng=1

Taking the inspector tools from Chrome/Firefox I can see that they are located here:

<span>England Premier League</span>

So I tried this

from lxml import html

from selenium import webdriver

session = webdriver.Firefox()
url = 'https://mobile.bet365.com/#type=Splash;key=1;ip=0;lng=1'
session.get(url)
tree = html.fromstring(session.page_source)
leagues = tree.xpath('//span/text()')
print(leagues)

Unfortunately this doesn't return the desired results :-(

To me it looks like the website has different frames and I'm extracting the content from the wrong frame.

Could anyone please help me out here or point me in the right direction? As an alternative if someone knows how to extract the information through their api then this would obviously be the superior solution.

Any help is much appreciated. Thank you!

3
  • Try to import requests and then parse tree = html.fromstring(requests.get("https://mobile.bet365.com/V6/sport/splash/splash.aspx?zone=0&isocode=RO&tzi=4&key=1&gn=0&cid=1&lng=1&ctg=1&ct=156&clt=8881&ot=2").content) Commented Sep 20, 2017 at 10:06
  • Thank you for your suggestion. Unfortunately this is not a solution for me as I'd need to emulate a real browser session and therefore need to use selenium (requests will not work and any attempt to scrape the content using this library will result in an IP-block from bet365). Also tried your url using selenium which returns an empty list. Commented Sep 20, 2017 at 10:51
  • Sometimes when you copy/paste something from SO it might contain hidden characters, so yeah, URL provided in my comment seem to be OK, but it's broken if to copy it... You can check the same URL from my answer. Also check answer itself. It returns desired output without extra text and there is no need to use time.sleep() and BeautifulSoup Commented Sep 20, 2017 at 17:31

2 Answers 2

2

Hope you are looking for something like this:

from selenium import webdriver
import  bs4, time

driver = webdriver.Chrome()
url = 'https://mobile.bet365.com/#type=Splash;key=1;ip=0;lng=1'


driver.get(url)
driver.maximize_window()
# sleep is given so that JS populate data in this time
time.sleep(10)
pSource= driver.page_source

soup = bs4.BeautifulSoup(pSource, "html.parser")


for data in soup.findAll('div',{'class':'eventWrapper'}):
    for res in data.find_all('span'):
        print res.text

It will print the below data:

Wednesday's Matches
International List
Elite Euro List
UK List
Australia List
Club Friendly List
England Premier League
England EFL Cup
England Championship
England League 1
England League 2
England National League
England National League North
England National League South
Scotland Premiership
Scotland League Cup
Scotland Championship
Scotland League One
Scotland League Two
Northern Ireland Reserve League
Scotland Development League East
Wales Premier League
Wales Cymru Alliance
Asia - World Cup Qualifying
UEFA Champions League
UEFA Europa League
Wednesday's Matches
International List
Elite Euro List
UK List
Australia List
Club Friendly List
England Premier League
England EFL Cup
England Championship
England League 1
England League 2
England National League
England National League North
England National League South
Scotland Premiership
Scotland League Cup
Scotland Championship
Scotland League One
Scotland League Two
Northern Ireland Reserve League
Scotland Development League East
Wales Premier League
Wales Cymru Alliance
Asia - World Cup Qualifying
UEFA Champions League
UEFA Europa League

Only problem is its printing result set twice

Sign up to request clarification or add additional context in comments.

3 Comments

Absolutely phenomenal, works perfectly! Many many thanx, printing the results twice is not a problem at all.
In fact adding time.sleep(10) to my script also works. Thank you for pointing out this essential part. JS obviously needs some time to populate the data!
The problem is that you are scraping every SPAN on the page which is resulting in too many results... results that you don't want. If you change it to the CSS selector div.podSplashRow :not(.empty), you will return only the list once. You still will get the names of the lists at the top of the page, but I don't see a way to programmatically remove those at first glance.
1

Required content is absent in initial page source. It comes dynamically from https://mobile.bet365.com/V6/sport/splash/splash.aspx?zone=0&isocode=RO&tzi=4&key=1&gn=0&cid=1&lng=1&ctg=1&ct=156&clt=8881&ot=2

To be able to get this content you can use ExplicitWait as below:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver

session = webdriver.Firefox()
url = 'https://mobile.bet365.com/#type=Splash;key=1;ip=0;lng=1'
session.get(url)
WebDriverWait(session, 10).until(EC.presence_of_element_located((By.ID, 'Splash')))

for collapsed in session.find_elements_by_xpath('//h3[contains(@class, "collapsed")]'):
    collapsed.location_once_scrolled_into_view
    collapsed.click()

for event in session.find_elements_by_xpath('//div[contains(@class, "eventWrapper")]//span'):
    print(event.text)

7 Comments

Your locator is only returning the UK leagues... there are other leagues further down the page.
Yep. It's not qiute clear which elements OP actually wants to get as, for example, "Wednesday's Matches" should not be included in all the league names as it's obviously not a League name...
I would be interested in all leagues. It seems your locator only returns leagues that are not collapsed. The "Wednesday's Matches" is not a problem and can be included.
@Baili Any locator is going to return only the leagues that aren't collapsed because they are the only ones that are visible. It's not clear in your question which names you wanted.
@JeffC Thank you. I would need all league names, also those that are collapsed by default when visiting the website. E.g. also Italy Serie A, Italy Serie B, Spain Primera Liga,...
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.