Trying to extract some useful information from a website. I came a bit now im stuck and in need of your help!
I need the information from this table
http://gbgfotboll.se/serier/?scr=scorers&ftid=57700
I wrote this code and i got the information that i wanted:
import lxml.html
from lxml.etree import XPath
url = ("http://gbgfotboll.se/serier/?scr=scorers&ftid=57700")
rows_xpath = XPath("//*[@id='content-primary']/div[1]/table/tbody/tr")
name_xpath = XPath("td[1]//text()")
team_xpath = XPath("td[2]//text()")
league_xpath = XPath("//*[@id='content-primary']/h1//text()")
html = lxml.html.parse(url)
divName = league_xpath(html)[0]
for id,row in enumerate(rows_xpath(html)):
scorername = name_xpath(row)[0]
team = team_xpath(row)[0]
print scorername, team
print divName
I get this error
scorername = name_xpath(row)[0]
IndexError: list index out of range
I do understand why i get the error. What i really need help with is that i only need the first 12 rows. This is what the extract should do in these three possible scenarios:
If there are less than 12 rows: Take all the rows except THE LAST ROW.
If there are 12 rows: same as above..
If there are more than 12 rows: Simply take the first 12 rows.
How can i can i do this?
EDIT1
It is not a duplicate. Sure it is the same site. But i have already done what that guy wanted to which was to get all the values from the row. Which i can already do. I don't need the last row and i dont want it to extract more than 12 rows if there is..
