0

I am trying to extract some information from a web page, but do not know how to define how to get specifically what I want.

Here is my code:

from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary

capabilities = webdriver.DesiredCapabilities().FIREFOX
capabilities["marionette"] = True
binary = FirefoxBinary("C:/PATH/Mozilla Firefox/firefox.exe")
driver = webdriver.Firefox(firefox_binary=binary, capabilities=capabilities, executable_path="geckodriver.exe")
driver.get("https://www.iparkit.com/Minneapolis")
content = driver.page_source

I would like to extract the addresses that are in the side bar. Here is an attempt to obtain the addresses:

address = driver.find_element_by_class_name('sidebar')
address.text


' SORT BY DISTANCE\n SORT BY PRICE\nLooking For A Specific Event?\nBUY\n1\nGateway Garage\n\n400 S 3rd Street\nMinneapolis, MN 55415\n 3 mins | Walk Distance\n (612) 338-2643\n$8.00\nBUY\n2\nGovernment Center Garage\n\n415 South 5th Street\nMinneapolis, MN 55415\n 5 mins | Walk Distance\n (612) 338-2643\n$13.00\nBUY\n3\n517 MARQUETTE\n\n517 MARQUETTE AVE\nMINNEAPOLIS, MN 55402\n 6 mins | Walk Distance\n (612) 746-3045\n$14.00\nBUY\n4\nMidtown Garage\n\n11 South 4th St.\nMinneapolis, MN 55402\n 7 mins | Walk Distance\n (612) 333-3940\n$13.00\nBUY\n5\nCentre Village Garage\n\n700 5th Avenue South\nMinneapolis, MN 55415\n 8 mins | Walk Distance\n (612) 338-2643\n$11.00\nBUY\n6\nGaviidae Commons Garage\n\n61 South 6th Street\nMinneapolis, MN 55402\n 8 mins | Walk Distance\n\n$15.00\nBUY\n7\nMarTen\n\n921 Marquette Avenue\nMinneapolis, MN 55402\n 13 mins | Walk Distance\n (612) 334-3498\n$9.00\nBUY\n8\nLoring Garage\n\n1300 Nicollet Mall\nMinneapolis, MN 55403\n 21 mins | Walk Distance\n (612) 338-2643\n$7.00'

How would I go about this to try and get the following result:

400 S 3rd Street
415 South 5th Street
517 MARQUETTE AVE
...

1 Answer 1

1

Why are you using address = driver.find_element_by_class_name('sidebar') - this is the reason why you are getting a lot of unwanted text in your code.

The text that you want to get is rendered in a div which is a result of an repeater - since the page is an Angular page.

<div ng-show=" ! searchInProgress" ng-repeat="result in results track by result.id" ng-click="goToLocation(result)" class="module shade mar-15-bot ng-scope" style="cursor: pointer;"> 

You should probably do something like this - not sure if the code is going to be accurate

get_all_divs = self.driver.find_elements_by_css_selector('.module.shade.mar-15-bot.ng-scope')

This will get you all divs inside the given repeater. Now the text that you want is inside the first div in a p tag.

for i in get_all_divs:
   print i.find_element_by_css_selector('div > p').text

You get inside the element with class and then inside that, you get the immediate div child and the p tag and the text inside it.

A bit rusty with Python so you might have to make changes to the for loop that I have written.

Sign up to request clarification or add additional context in comments.

2 Comments

Hi @demouser123 thank you for your response, and apologies I'm quite new with selenium. I am running into some errors on get_all_divs = self.driver.find_elements_by_class_name('.module.shade.mar-15-bot.ng-scope'). When I write get_all_divs = driver.find_elements_by_class_name('.module.shade.mar-15-bot.ng-scope') I'm receiving this: InvalidSelectorException: Message: Given css selector expression "..module.shade.mar-15-bot.ng-scope" is invalid: SyntaxError: '..module.shade.mar-15-bot.ng-scope' is not a valid selector
See edited content. Instead of using class_name using css_selector to get the list of elements.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.