2

I keep on running into walls. Can anybody help me by telling me how to crawl multiple pages from one website using Selenium without having to repeat my code over and over.

Here is my current code:

RegionIDArray = ['de/7132/New-York-City/d687-allthingstodo',  'de/7132/London/d737-allthingstodo']

class Crawling(unittest.TestCase):
 def setUp(self):
     self.driver = webdriver.Firefox()
     self.driver.set_window_size(10, 10)
     self.base_url = "http://www.jsox.de/"
     self.accept_next_alert = True


 def test_sel(self):
     driver = self.driver
     delay = 3
     for reg in RegionIDArray:
        page = 0
     driver.get(self.base_url + str(reg))
     for i in range(1,4):
         driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
         time.sleep(2)

If I run this code, I only get the results for London but not the second city New York.

Now, I can do this manually by repeating my code over and over and crawling each individual website page and then concatenating my results for each of these dataframes together but that seems very unpythonic. I was wondering if anyone had a faster way or any advice?

Any feedback is appreciated:)

EDIT

I modified my code according the comment to Anil. Selenium opens the page now for New York and London but it only delivers the results back for London. Any idea, what the reason could be?

Modified code:

 RegionIDArray = ['de/7132/New-York-City/d687-allthingstodo', 'de/7132/London/d737-allthingstodo']


 class Crawling(unittest.TestCase):
     def setUp(self):
         self.driver = webdriver.Firefox()
         self.driver.set_window_size(10, 10)
         self.base_url = "http://www.jsox.de/"
         self.accept_next_alert = True


     def test_sel(self):
         driver = self.driver
         delay = 3
         for reg in RegionIDArray:
             page = 0
             driver.get(self.base_url + str(reg))
             for i in range(1,4):
             driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
             time.sleep(2)
1
  • your driver.get() is out of for loop Commented Dec 14, 2015 at 13:25

2 Answers 2

1

Your for loop

for reg in RegionIDArray:
    page = 0

will loop through all list items and when it exits reg points to the last item i.e., London. That is why you get only the last item

Instead you just need to put the driver part inside the for loop

def test_sel(self):
     driver = self.driver
     delay = 3
     for reg in RegionIDArray:
         page = 0
         driver.get(self.base_url + str(reg))
         for i in range(1,4):
             driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
             time.sleep(2)
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for your feedback. I modified my code according to your comment. Selenium opens the page now for New York and London but it only delivers the results back for London. Any idea, what the reason could be?
@SeriousRuffy For the code you have provided it scrolls to bottom for both pages. you must have left your data return part out of the loop
@SeriousRuffy Is this your full code ?? because this code does not return any data. If not , i am saying you forgot to include rest of the logic part in the for loop. If this your full code , this works as expected
1

Python loops are controlled by indentations.

for i in range(1,4):
             driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
             time.sleep(2)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.