How to scrape JavaScript page with Python

Question

I'm trying to scrape patentsview.org but I'm having an issue. When I try to scrape this page, it doesn't work well. Site using JavaScript to get data from their database. I tried to get the data using requests-html package but I didn't quite understand.

Here's what I tried:

# Import
import re
from bs4 import BeautifulSoup
from requests_html import HTMLSession
session = HTMLSession()

# Set requests
r = session.get('https://datatool.patentsview.org/#search/assignee&asn=1|Samsung')
r.html.render()

# Set BS and print
soup = BeautifulSoup(r.html.html, "lxml")
tags = soup.find_all("div", class_='summary')
print(tags)

This code gives me this result:

# Result
[<div class="summary"></div>]

But I want this:

This is the right div. But I can't see content of div with my code. How can I get the div's content? Hope you understand what I meant.

Have a look at my function posted in stackoverflow.com/a/73054340/4293361 I'm sure it can help... — NeuroMorphing
– NeuroMorphing, Commented Jul 20, 2022 at 16:15

balderman · Accepted Answer · 2021-04-17 14:35:25Z

Use the browser dev tools. (Chrome. F12 - Network - XHR) and see the HTTP GET thst return the data (as JSON) you are looking for.

HTTP GET https://webapi.patentsview.org/api/assignees/query?q={%22_and%22:[{%22_or%22:[{%22_and%22:[{%22_contains%22:{%22assignee_first_name%22:%22Samsung%22}}]},{%22_and%22:[{%22_contains%22:{%22assignee_last_name%22:%22Samsung%22}}]},{%22_and%22:[{%22_contains%22:{%22assignee_organization%22:%22Samsung%22}}]}]}]}&f=[%22assignee_id%22,%22assignee_first_name%22,%22assignee_last_name%22,%22assignee_organization%22,%22assignee_lastknown_country%22,%22assignee_lastknown_state%22,%22assignee_lastknown_city%22,%22assignee_lastknown_location_id%22,%22assignee_total_num_patents%22,%22assignee_first_seen_date%22,%22assignee_last_seen_date%22,%22patent_id%22]&o={%22per_page%22:50,%22matched_subentities_only%22:true,%22sort_by_subentity_counts%22:%22patent_id%22,%22page%22:1}&s=[{%22patent_id%22:%22desc%22},{%22assignee_total_num_patents%22:%22desc%22},{%22assignee_organization%22:%22asc%22},{%22assignee_last_name%22:%22asc%22},{%22assignee_first_name%22:%22asc%22}]

Collectives™ on Stack Overflow

How to scrape JavaScript page with Python

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related