1

Consider:

Enter image description here

I am using Selenium to scrape the contents from the App Store: https://apps.apple.com/us/app/bank-of-america-private-bank/id1096813830

I tried to extract the text field "As subject matter experts, our team is very engaging..."

I tried to find elements by class

review_ratings = driver.find_elements_by_class_name('we-truncate we-truncate--multi-line we-truncate--interactive ember-view we-customer-review__body')
review_ratingsList = []
for e in review_ratings:
review_ratingsList.append(e.get_attribute('innerHTML'))
review_ratings

But it returns an empty list [].

Is anything wrong with the code? Or is there a better solution?

2
  • Re "Is anything wrong with the code?": Yes, doesn't even compile (or whatever the equivalent is in Python): IndentationError: expected an indented block Commented Nov 16, 2022 at 1:48
  • Related: Why should I not upload images of code/data/errors? Commented Nov 16, 2022 at 2:03

4 Answers 4

5

Using Requests and Beautiful Soup:

import requests
from bs4 import BeautifulSoup

url = 'https://apps.apple.com/us/app/bank-of-america-private-bank/id1096813830'

res = requests.get(url)
soup = BeautifulSoup(res.text,'lxml')
item = soup.select_one("blockquote > p").text
print(item)

Output:

As subject matter experts, our team is very engaging and focused on our near and long term financial health!
Sign up to request clarification or add additional context in comments.

2 Comments

I like this answer as if you have a list of URLs this is I imagine the quickest approach to go to the specific pages and scrape them. If navigation between pages is needed then this would not work.
Glad to know. Thanks!
3

You can use WebDriverWait to wait for visibility of an element and get the text. Please check good Selenium locator.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

#...

wait = WebDriverWait(driver, 5)
review_ratings = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".we-customer-review")))
for review_rating in review_ratings:
    starts = review_rating.find_element_by_css_selector(".we-star-rating").get_attribute("aria-label")
    title = review_rating.find_element_by_css_selector("h3").text
    review = review_rating.find_element_by_css_selector("p").text

Comments

3

Mix Selenium with Beautiful Soup.

Using WebDriver:

from bs4 import BeautifulSoup
from selenium import webdriver

browser = webdriver.Chrome()
url = "https://apps.apple.com/us/app/bank-of-america-private-bank/id1096813830"
browser.get(url)
innerHTML = browser.execute_script("return document.body.innerHTML")

bs = BeautifulSoup(innerHTML, 'html.parser')

bs.blockquote.p.text

Output:

Out[22]: 'As subject matter experts, our team is very engaging and focused on our near and long term financial health!'

3 Comments

@Juan C what ""return document.body.innerHTML"" does? thank you
AFAIK, it lets you get the HTML code of a dynamic HTML webpage after everything loads. If you try to get the BeautifulSoup object from the raw link it won't have the data of javascript objects that have to load and the like. (I don't come from a software engineering background, so my language might be inaccurate).
@ArthurMorgan if you find it is the best answer could you please mark it so your question doesn't appear in the unanswered questions page ?
2

Use WebDriverWait and wait for presence_of_all_elements_located and use the following CSS selector.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get("https://apps.apple.com/us/app/bank-of-america-private-bank/id1096813830")
review_ratings = WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '.we-customer-review__body p[dir="ltr"]')))
review_ratingsList = []
for e in review_ratings:
    review_ratingsList.append(e.get_attribute('innerHTML'))
print(review_ratingsList)

Output:

['As subject matter experts, our team is very engaging and focused on our near and long term financial health!', 'Very much seems to be an unfinished app. Can’t find secure message alert. Or any alerts for that matter. Most of my client team is missing from the “send to” list. I have other functions very useful, when away from my computer.']

2 Comments

May I ask what does the ".we-customer-review__body" do? @KunduK
It is class name.For css selector classname starts with .classname

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.