0

I am using Python Selenium Webdriver to pull some information from the following site: http://www.ukathletics.com/schedule-list/#!/m-basebl/2016

I am interested in pulling some links, dates and team names. I have written the following code that identifies the correct information that I am looking for, however it only seems to grab the information up to a certain point and then instead appends empty items to my list (i.e. '').

I know that all of the lists should have 66 items if pulled correctly (Kentucky played 66 games). Any ideas why it stops pulling the information after the second LSU game?

bs = [] #boxscores
team2 = [] #opponents
dates = [] #dates of games
team1 = 'KENTUCKY' #team of interest

driver = webdriver.Chrome()
driver.get('http://www.ukathletics.com/schedule-list/#!/m-basebl/2016')

elem = driver.find_elements_by_class_name('event_link')
for i in elem:
    bs.append(i.get_attribute('href'))
links = sorted(set(bs), key=lambda x: bs.index(x))

elem = driver.find_elements_by_class_name('school_name')
team2 = [i.text for i in elem if i.text!=team1]

elem = driver.find_elements_by_class_name('date')
for i in elem:
    dates.append(i.text.replace(',','').replace('\n',' '))

print(links)
print(team2)
print(dates)
print(len(links))
print(len(team2))
print(len(dates))

MY RESULTS:

['http://www.ukathletics.com/game-center/580644ebe4b07dac0ca58a91/', 'http://www.ukathletics.com/game-center/5806455ce4b07dac0ca58a92/', 'http://www.ukathletics.com/game-center/58064594e4b09266491b651d/', 'http://www.ukathletics.com/game-center/5820d9dbe4b0493932cf30fd/', 'http://www.ukathletics.com/game-center/5820da33e4b0493932cf30fe/', 'http://www.ukathletics.com/game-center/5820da86e4b05e67c64470ca/', 'http://www.ukathletics.com/game-center/5820dabde4b0493932cf30ff/', 'http://www.ukathletics.com/game-center/5820daf4e4b05e67c64470cb/', 'http://www.ukathletics.com/game-center/5820db25e4b05e67c64470cc/', 'http://www.ukathletics.com/game-center/5820db6ce4b0493932cf3100/', 'http://www.ukathletics.com/game-center/5820db91e4b05e67c64470de/', 'http://www.ukathletics.com/game-center/5820dbb6e4b05e67c64470df/', 'http://www.ukathletics.com/game-center/5820dbe3e4b0493932cf3101/', 'http://www.ukathletics.com/game-center/5820dc0de4b05e67c64470e0/', 'http://www.ukathletics.com/game-center/58c1e98ee4b066e02ca82086/', 'http://www.ukathletics.com/game-center/5820dc32e4b05e67c64470e1/', 'http://www.ukathletics.com/game-center/5820dc80e4b0493932cf3102/', 'http://www.ukathletics.com/game-center/5820dcaae4b0493932cf3103/', 'http://www.ukathletics.com/game-center/5820dd1ee4b0493932cf3104/', 'http://www.ukathletics.com/game-center/5820dd6fe4b0493932cf3105/', 'http://www.ukathletics.com/game-center/5820dd8ce4b05e67c64470e3/', 'http://www.ukathletics.com/game-center/5820de21e4b05e67c64470e4/', 'http://www.ukathletics.com/game-center/5820de47e4b0493932cf3106/', 'http://www.ukathletics.com/game-center/5820de69e4b05e67c64470e5/', 'http://www.ukathletics.com/game-center/5820de87e4b0493932cf3107/', 'http://www.ukathletics.com/game-center/5820dea9e4b05e67c64470e6/', 'http://www.ukathletics.com/game-center/5820decee4b0493932cf3108/', 'http://www.ukathletics.com/game-center/5820deebe4b05e67c64470e7/', 'http://www.ukathletics.com/game-center/5820df0ce4b05e67c64470e8/', 'http://www.ukathletics.com/game-center/5820df50e4b0493932cf3114/', 'http://www.ukathletics.com/game-center/5820df85e4b05e67c64470e9/', 'http://www.ukathletics.com/game-center/5820dfa9e4b05e67c64470ea/', 'http://www.ukathletics.com/game-center/5820dfc7e4b05e67c64470eb/', 'http://www.ukathletics.com/game-center/5820dfebe4b0493932cf3115/', 'http://www.ukathletics.com/game-center/5820e023e4b0493932cf3116/', 'http://www.ukathletics.com/game-center/5820e03ee4b0493932cf3117/', 'http://www.ukathletics.com/game-center/5820e056e4b0493932cf3118/', 'http://www.ukathletics.com/game-center/5820e089e4b0493932cf3119/', 'http://www.ukathletics.com/game-center/5820e0bee4b05e67c64470ed/', 'http://www.ukathletics.com/game-center/5820e0a4e4b05e67c64470ec/']
['NORTH CAROLINA', 'NORTH CAROLINA', 'NORTH CAROLINA', 'LIBERTY', "ST. JOSEPH'S", 'OLD DOMINION', 'DELAWARE', 'E. KENTUCKY', 'WKU', 'UC SANTA BARBARA', 'UC SANTA BARBARA', 'UC SANTA BARBARA', 'WRIGHT STATE', 'CINCINNATI', 'MIAMI (OH)', 'MIAMI (OH)', 'MIAMI (OH)', 'MURRAY STATE', 'TEXAS A&M', 'TEXAS A&M', 'TEXAS A&M', 'WKU', 'OLE MISS', 'OLE MISS', 'OLE MISS', 'CINCINNATI', 'VANDERBILT', 'VANDERBILT', 'VANDERBILT', 'LOUISVILLE', 'MISSISSIPPI STATE', 'MISSISSIPPI STATE', 'MISSISSIPPI STATE', 'UT MARTIN', 'MIZZOU', 'MIZZOU', 'MIZZOU', 'LOUISVILLE', 'LSU', 'LSU', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
['FRI FEB 17', 'SAT FEB 18', 'SUN FEB 19', 'WED FEB 22', 'FRI FEB 24', 'SAT FEB 25', 'SUN FEB 26', 'TUE FEB 28', 'WED MAR 1', 'FRI MAR 3', 'SAT MAR 4', 'SUN MAR 5', 'TUE MAR 7', 'WED MAR 8', 'THU MAR 9', 'FRI MAR 10', 'SUN MAR 12', 'TUE MAR 14', 'FRI MAR 17', 'SAT MAR 18', 'SUN MAR 19', 'TUE MAR 21', 'THU MAR 23', 'FRI MAR 24', 'SAT MAR 25', 'TUE MAR 28', 'FRI MAR 31', 'SAT APR 1', 'SUN APR 2', 'TUE APR 4', 'FRI APR 7', 'SAT APR 8', 'SUN APR 9', 'WED APR 12', 'FRI APR 14', 'SAT APR 15', 'SUN APR 16', 'TUE APR 18', 'FRI APR 21', 'FRI APR 21', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
40
120
80
7
  • Apart from the different lengths of appending ''; all 3 lists contain exactly 40 values. There are 40 links, with 40 games with 40 dates. Seems plausible Commented Aug 8, 2017 at 17:32
  • I understand the trend you are saying and that was my original thought as well, however when I inspect elements on the page (e.g. the 3rd LSU game), the link, date and team name are all there but the code doesn't grab it... Commented Aug 8, 2017 at 17:35
  • You are right, the website itself has its elements in order it seems. Could you try checking the shape of elem at every step? Commented Aug 8, 2017 at 17:55
  • I just restructured my code to pass the page_source object to BeautifulSoup and try parsing from there. The same issue is still occurring. Here is the link elem shape: <a class="event_link" href="/game-center/5820e0a4e4b05e67c64470ec/" ng-href="/game-center/5820e0a4e4b05e67c64470ec/" target="_self"></a> Here is the date element shape: <p class="date"> <span class="weekday_month" ng-bind="dateContext(event)">Sat, Feb</span> <span class="calendar_day" ng-bind="dateNumber(event)">18</span> </p> Commented Aug 8, 2017 at 17:59
  • I was referring more to whether something happens in the for i in elem loop or whether the correct amount of elements was never found in the first place Commented Aug 8, 2017 at 18:10

1 Answer 1

1

Actually all the elements are not fetched because they are not loaded. If you observe carefully the bottom elements of the table loaded only when scrolled down at the end of page.

You can try by adding below code after opening page in order to load complete table.

driver = webdriver.Chrome()
driver.get('http://www.ukathletics.com/schedule-list/#!/m-basebl/2016')
time.sleep(5)
driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL + Keys.END)
time.sleep(5)
driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL  +Keys.END)
  • wait is added for page loading
  • scroll down used for two times in order make sure the actual bottom of the table loads in case longer length.

I have tested it and gives below output:

66    #print(len(links))
198   #print(len(team2))
132   #print(len(dates))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.