1

I am trying to parse data from a website. For e.g the portion of SRC code looks like this for the site i am trying to extract data from.

<table summary="Customer Pending and Vendor Pending Table">
  <tr>
        <th> <a href="/team.php?wrkgrp=Somedata&Escalationsort=Level&Escalationorder=0#Escalation" class="headlink">
  <img src="/images/rat/up_selected.png" width="11" height="9" border="0" alt="up">Risk          </a> </th>
        <th> <a href="/team.php?wrkgrp=Somedata&Escalationsort=AvgLastUpd&Escalationorder=1#Escalation" class="headlink">
  Avg Last Updated          </a> </th>
        <th> <a href="/team.php?wrkgrp=Somedata&Escalationsort=AvgDaysOpen&Escalationorder=1#Escalation" class="headlink">
  Avg Days Open          </a> </th>
        <th> <a href="/team.php?wrkgrp=Somedata&Escalationsort=Srs&Escalationorder=1#Escalation" class="headlink">
  # of Cases          </a> </th>
        <th> <a href="/team.php?wrkgrp=Somedata&Escalationsort_pct=1&Escalationorder=1#Escalation" class="headlink">% of Total Cases</a> </th>
  </tr>
        <tr >
  <td><a href="/snapshot.php?statusrisk=2&wrkgrp=Somedata&function=statusrisk&statuses=CustomerPending"><img src="/images/rat/severity_2.gif" alt="Very High Risk" title="Very High Risk" border="0"></a></td>
    <td>   8.0</td>
    <td>  69.0</td>
    <td>1</td>
    <td>   3.1</td>
  </tr>

I need to extract the values 8.0,69.0 and 3.1 from teh above table. My Python code looks like this.

from lxml import html
import requests

page = requests.get('http://rat-sucker.abc.com/team.php?wrkgrp=somedata')
tree = html.fromstring(page.text)
Stats = tree.xpath(//*[@id="leftrat"]/table[1]/tbody/tr[2]/td[2])

print 'Stats: ', Stats

I have checked my Xpath using several methods and Xcode simulator, it is correct(if you run on the above partial code it may not work), but when my python script is run it does not generate any output.

[root@testbed testhost]# python scrapper.py Stats

[root@testbed testhost]#

1
  • 1
    http://rat-sucker.abc.com/team.php?wrkgrp=somedata does not lead anywhere. Can you add the actual URL? Commented Feb 10, 2015 at 14:38

1 Answer 1

4

You could use BeautifulSoup parser.

>>> s = '''<table summary="Customer Pending and Vendor Pending Table">
  <tr>
        <th> <a href="/team.php?wrkgrp=Somedata&Escalationsort=Level&Escalationorder=0#Escalation" class="headlink">
  <img src="/images/rat/up_selected.png" width="11" height="9" border="0" alt="up">Risk          </a> </th>
        <th> <a href="/team.php?wrkgrp=Somedata&Escalationsort=AvgLastUpd&Escalationorder=1#Escalation" class="headlink">
  Avg Last Updated          </a> </th>
        <th> <a href="/team.php?wrkgrp=Somedata&Escalationsort=AvgDaysOpen&Escalationorder=1#Escalation" class="headlink">
  Avg Days Open          </a> </th>
        <th> <a href="/team.php?wrkgrp=Somedata&Escalationsort=Srs&Escalationorder=1#Escalation" class="headlink">
  # of Cases          </a> </th>
        <th> <a href="/team.php?wrkgrp=Somedata&Escalationsort_pct=1&Escalationorder=1#Escalation" class="headlink">% of Total Cases</a> </th>
  </tr>
        <tr >
  <td><a href="/snapshot.php?statusrisk=2&wrkgrp=Somedata&function=statusrisk&statuses=CustomerPending"><img src="/images/rat/severity_2.gif" alt="Very High Risk" title="Very High Risk" border="0"></a></td>
    <td>   8.0</td>
    <td>  69.0</td>
    <td>1</td>
    <td>   3.1</td>
  </tr>'''
>>> soup = BeautifulSoup(s)
>>> [i.text.strip() for i in soup.find_all('td', text=True)]
['8.0', '69.0', '1', '3.1']
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.