0

I am working with BeautifulSoup and I keep getting an error continue not properly in loop. So I deleted the continue and then I get an invalid syntax error for my print statement. I am running BS4 and Python 2.7.5 all help greatly appreciated. Here is my code.

from bs4 import BeautifulSoup

soup = BeautifulSoup (open("43rd-congress.html"))

final_link = soup.p.a
final_link.decompose()

trs = soup.find_all('tr')

for tr in trs:
for link in tr.find_all('a'):
    fulllink = link.get('href')
    print fulllink #print in terminal to verify results

tds = tr.find_all("td")


try: #we are using "try" because the table is not well formatted. 
   names = str(tds[0].get_text()) 
   years = str(tds[1].get_text())
   positions = str(tds[2].get_text())
   parties = str(tds[3].get_text())
   states = str(tds[4].get_text())
   congress = tds[5].get_text()

except:
  print "bad tr string"
  continue 

print names, years, positions, parties, states, congress
3
  • 1
    What do you expect the continue to do here? Commented Oct 25, 2013 at 16:56
  • 2
    Your code isn't formatted correctly. Can you format it as it should? @MartijnPieters I believe the whole part of the code below the first for is wrongly nested. Commented Oct 25, 2013 at 16:58
  • Is everything after for tr in trs: supposed to be in that loop? Please indent accordingly. Commented Oct 25, 2013 at 17:02

2 Answers 2

1

Since you seem to have the error, I believe that you probably really have the wrong indent in your file. Your code should look like this probably:

from bs4 import BeautifulSoup

soup = BeautifulSoup (open("43rd-congress.html"))

final_link = soup.p.a
final_link.decompose()

trs = soup.find_all('tr')

for tr in trs:

    for link in tr.find_all('a'):
        fulllink = link.get('href')
        print fulllink #print in terminal to verify results

    tds = tr.find_all("td")


    try: #we are using "try" because the table is not well formatted. 
       names = str(tds[0].get_text()) 
       years = str(tds[1].get_text())
       positions = str(tds[2].get_text())
       parties = str(tds[3].get_text())
       states = str(tds[4].get_text())
       congress = tds[5].get_text()

       print names, years, positions, parties, states, congress

    except exc:
      print "bad tr string"

In python, each block of code should be nested with indent using tabs/space. Mixing them isn't good.

In your code, you have a first for loop that will walk all tr and a second that prints all urls.

But you forgot to indent the first block that should be inside the for loop.

Edit

Also you don't have to use continue in your case. Check my edit to your code.

Sign up to request clarification or add additional context in comments.

Comments

0

Indentation looks off on print/continue. If it is off, the except: will look like it's empty, which I'm not sure is something Python is happy with.

Try commenting out everything not related to the try/except and see if it still gives you the error.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.