0

I'm trying to figure out how to apply a for-loop to this script and I'm having a lot of trouble. I want to iterate through a list of subdomains which are stored in csv format (ie: one column with 20 subdomains) and print the html for each. They all have the same SourceDomain. Thanks!

#Python 2.6
from selenium import selenium
import unittest, time, re, csv, logging

class Untitled(unittest.TestCase):
    def setUp(self):
        self.verificationErrors = []
        self.selenium = selenium("localhost", 4444, "*firefox", "http://www.SourceDomain.com")
        self.selenium.start()

    def test_untitled(self):
        sel = self.selenium
        sel.open("/dns/www.subdomains.com.html")
        sel.wait_for_page_to_load("30000")
        html = sel.get_html_source()
        print html

    def tearDown(self):
        self.selenium.stop()
        self.assertEqual([], self.verificationErrors)

if __name__ == "__main__":
    unittest.main()

1 Answer 1

3
#Python 2.6
from selenium import selenium
import unittest, time, re, csv, logging

class Untitled(unittest.TestCase):
    def setUp(self):
        self.verificationErrors = []
        self.selenium = selenium("localhost", 4444, "*firefox", "http://www.SourceDomain.com")
        self.selenium.start()

    def test_untitled(self):
        sel = self.selenium
        spamReader = csv.reader(open('your_file.csv'))
        for row in spamReader:
            sel.open(row[0])
            sel.wait_for_page_to_load("30000")
            print sel.get_html_source()

    def tearDown(self):
        self.selenium.stop()
        self.assertEqual([], self.verificationErrors)

if __name__ == "__main__":
    unittest.main()

BTW, notice there's no need to place this script wrapped inside a unittest testcase. Even better, you don't need selenium for such a simple task (at least at first sight).

Try this:

import urllib2, csv

def fetchsource(url):
    page = urllib2.urlopen(url)
    source = page.read()
    return source

fooReader = csv.reader(open('your_file.csv'))
for url in fooReader:
    print fetchsource(url)
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks - I'm trying to test your first answer now. I couldnt get urllib2 to work because these pages use a LOT of javaScript. For which, Alex Martelli advised me to use Selenium.
I kept getting syntax error because I forgot the second closing bracket on (open('your_file.csv')) :-P It works! Thank you!
Ah, that's one of the small reasons for which you would use selenium in thins kind of tasks. Glad to see it helped.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.