1

I'm learning web scraping using python.

Here is my first python code

# encoding=utf8
import urllib2
from bs4 import BeautifulSoup


soup = BeautifulSoup(urllib2.urlopen("http://www.bcsfootball.org/").read(),"lxml")

for row in soup("table", {'class': "mod-data"})[0].tbody("tr"):
    tds = row('td')
    print tds[0].string, tds[1].string

I'm getting error

/usr/bin/python2.7 /home/NewYork/PycharmProjects/untitled/News.py
Traceback (most recent call last):
  File "/home/NewYork/PycharmProjects/untitled/News.py", line 8, in <module>
    for row in soup("table", {'class': "mod-data"})[0].tbody("tr"):
IndexError: list index out of range

Can anyone help me what am doing wrong ?

And one more thing I would like to ask ...please help me to understand what is happening here exactly...

for row in soup("table", {'class': "mod-data"})[0].tbody("tr"):

Thanks !! :)

9
  • 1
    Learning web scraping using Python is all well and good, but you will also need to learn Python per se, or you'll get stuck on error messages like this. This particular one means that the list returned by the soup() call was empty, and therefore does not have a first element. Commented Jul 3, 2016 at 5:52
  • Your soup throwing error: UnicodeEncodeError: 'ascii' codec can't encode character '\xa0' in position 10082: ordinal not in range(128) Commented Jul 3, 2016 at 5:53
  • unicode literal..utf-8 encoding Commented Jul 3, 2016 at 5:54
  • You're trying to use 0 index for element that is not there - your soup("table", {'class': "mod-data"})[0] is None, and you start iterating without verifying it. Commented Jul 3, 2016 at 6:03
  • 1
    If you open the url you're calling and do a view source you'll find no table tags and no class named mod-data Commented Jul 3, 2016 at 6:09

2 Answers 2

1

The error message means soup("table", {'class': "mod-data"}) is an empty list, but you want to get the first element in this list.

You should ensure the table element has a node using class "mod-data".

Sign up to request clarification or add additional context in comments.

1 Comment

checked now that is not there...but what would be expression of tags in above code if I want to extract welcome paragraph from website ?
0

This would give you the expected result:

import urllib2
from bs4 import BeautifulSoup


soup = BeautifulSoup(urllib2.urlopen("http://www.bcsfootball.org").read(),"html")

welcome = soup("div", {'class': "col-full"})[1] # we know it's index 1


for item in welcome:
   print item.string

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.