1

Trying to extract some useful information from a website. I came a bit now im stuck and in need of your help!

I need the information from this table

http://gbgfotboll.se/serier/?scr=scorers&ftid=57700

I wrote this code and i got the information that i wanted:

import lxml.html
from lxml.etree import XPath

url = ("http://gbgfotboll.se/serier/?scr=scorers&ftid=57700")

rows_xpath = XPath("//*[@id='content-primary']/div[1]/table/tbody/tr")
name_xpath = XPath("td[1]//text()")
team_xpath = XPath("td[2]//text()")

league_xpath = XPath("//*[@id='content-primary']/h1//text()")


html = lxml.html.parse(url)

divName = league_xpath(html)[0]

for id,row in enumerate(rows_xpath(html)):
    scorername = name_xpath(row)[0]
    team = team_xpath(row)[0]
    print scorername, team


print divName

I get this error

    scorername = name_xpath(row)[0]
IndexError: list index out of range

I do understand why i get the error. What i really need help with is that i only need the first 12 rows. This is what the extract should do in these three possible scenarios:

If there are less than 12 rows: Take all the rows except THE LAST ROW.

If there are 12 rows: same as above..

If there are more than 12 rows: Simply take the first 12 rows.

How can i can i do this?

EDIT1

It is not a duplicate. Sure it is the same site. But i have already done what that guy wanted to which was to get all the values from the row. Which i can already do. I don't need the last row and i dont want it to extract more than 12 rows if there is..

5
  • possible duplicate of Extracting information from a table on a website using python, LXML & XPATH Commented Apr 12, 2015 at 23:28
  • @felipsmartins its not a duplicate, check my edit Commented Apr 12, 2015 at 23:32
  • Ok, I'll put my answer soon. Commented Apr 12, 2015 at 23:36
  • Perfect, anxious to see how you solve it :D @felipsmartins Commented Apr 12, 2015 at 23:52
  • I've just poested my answear. Take a look it! Commented Apr 13, 2015 at 0:31

2 Answers 2

1

I think is it what you want:

#coding: utf-8
from lxml import etree
import lxml.html

collected = [] #list-tuple of [(col1, col2...), (col1, col2...)]
dom = lxml.html.parse("http://gbgfotboll.se/serier/?scr=scorers&ftid=57700")
#all table rows
xpatheval = etree.XPathDocumentEvaluator(dom)
rows = xpatheval('//div[@id="content-primary"]/div/table[1]/tbody/tr')
# If there are less than 12 rows (or <=12): Take all the rows except the last.
if len(rows) <= 12:
    rows.pop() 
else:
    # If there are more than 12 rows: Simply take the first 12 rows.
    rows = rows[0:12]

for row in rows:
    # all columns of current table row (Spelare, Lag, Mal, straffmal)
    columns = row.findall("td")
    # pick textual data from each <td>
    collected.append([column.text for column in columns])

for i in collected: print i

Output:

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

Absolutely perfect, I used the techniques you used to my own code. Thank you so much
0

This is how you can get the rows you need based on what you described in your post. This is just the logic based on concept that rows is a list, you have to incorporate this into your code as needed.

if len(rows) <=12:
    print rows[0:-1]
elif len(rows) > 12:
    print rows[0:12]

3 Comments

But it is just printing out elements? I don't see how i can access the individual elements like i do in my code?
@AppDev I put print there but but you can do anything you need to with this. This answer your question in your post: "if there are less than 12 rows: Take all the rows except THE LAST ROW. If there are 12 rows: same as above..If there are more than 12 rows: Simply take the first 12 rows. How can I do this?"
@AppDev instead of print you can just have a variable like x like so: x = rows[0:-1] or x = rows[0:12] then you can iterate through x and access the individual elements

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.