0

I am modifying the play-scraper API to scrape play-store app details. It uses BeautifulSoup to parse HTML pages [reference].

I am particularly interested in all the additional information available for an app as shown in the screenshot below. (The above screenshot is taken from this app.)



I am stuck at extracting the list of permissions that an app asks for (shown in the above figure) because the View details URL under Permissions is as follows.

<a class="hrTbp" jsname="Hly47e">View details</a>

Clicking the View details URL shows a list of permissions (screenshot as follows) that I want to extract.



I am not familiar with Javascript. Any help would be appreciated.

1
  • It's not a real link: it does not redirect to another page as classic a tags would (it doesn't have the href attribute anyway). Instead, there is a listener somewhere that opens a popup when the user click on the "link". Commented Jun 4, 2020 at 20:36

1 Answer 1

2

If I understand the question correctly you are trying to scrape the data from a modal. And when the website loads for the first time these modals data aren't available inside html. They are fetched after you click the view details button. That's why the parser doesn't get the data inside the modal, in your case the permission informations. So this is the reason of your problem.

Now about the solution, one possible solution could be achieved by using the Selenium and chromedriver by performing click event on the view details text and then fetching the modal data. Have a look at this link to get an idea.

Update: To get an idea about the solution using Selenium and chromedriver consider the following code:

options = Options()
options.headless = True
driver = webdriver.Chrome('local_path_to_chrome_driver', options=options)

driver.get(url_of_the_play_store_app)
time.sleep(5) #sleep for 5 secs sometime to fetch the data
driver.find_element_by_link_text("View details").click() #performing the click event
time.sleep(5) # again sleep for 5 secs to fetch the modal data
soup = BeautifulSoup(driver.page_source, "lxml")

The soup variable now has the updated scraped data including the modal window data and you can retrieve the modal window data from soup.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.