0

Hello World,

New in Python, I am trying to webscrape a javascript page : https://search.gleif.org/#/search/

Please find below the result from my code (using request)

<!DOCTYPE html>
<html>
<head><meta charset="utf-8"/>
<meta content="width=device-width,initial-scale=1" name="viewport"/>
<title>LEI Search 2.0</title>
<link href="/static/icons/favicon.ico" rel="shortcut icon" type="image/x-icon"/>
<link href="https://fonts.googleapis.com/css?family=Open+Sans:200,300,400,600,700,900&amp;subset=cyrillic,cyrillic-ext,greek,greek-ext,latin-ext,vietnamese" rel="stylesheet"/>
<link href="/static/css/main.045139db483277222eb714c1ff8c54f2.css" rel="stylesheet"/></head>
<body>
<div id="app"></div>
<script src="/static/js/manifest.2ae2e69a05c33dfc65f8.js" type="text/javascript"></script>
<script src="/static/js/vendor.6bd9028998d5ca3bb72f.js" type="text/javascript"></script>
<script src="/static/js/main.5da23c5198041f0ec5af.js" type="text/javascript"></script>
</body>
</html>

The question: Instead of retrieving the above script:
"src="/static/js/manifest.2ae2e69a05c33dfc65f8.js" type="text/javascript""

I would like to have the content of the table in order to store it.

Table that I want to scrape enter image description here

5
  • What exactly do you want to find? Commented Nov 10, 2019 at 22:51
  • So the question is how to set proxy auth in selenium? You can google that and find some workarounds for selenium's limitations. Commented Nov 10, 2019 at 23:46
  • @pguardiario the question is how do I get the table content instead of the js.script.if you have any hint? Commented Nov 24, 2019 at 10:25
  • @SuperStormer, I want to scrape the table but instead of that Im getting the script js. Have you any idea on how to deal with it? Commented Nov 24, 2019 at 10:28
  • You would use selenium for that. Commented Nov 25, 2019 at 0:56

1 Answer 1

1

Following code is written using PySelenium.

import time
from selenium import webdriver

country = []
legal_name = []
lei = []

driver = webdriver.Chrome()
driver.implicitly_wait(5)

for i in range(1,30395):
    driver.get('https://search.gleif.org/#/search/fulltextFilterId=LEIREC_FULLTEXT&currentPage='+str(i)+'&perPage=50&expertMode=false#results-section')

    time.sleep(5)

    country += [i.get_attribute('innerHTML') for i in driver.find_elements_by_xpath('//*[@class="table-cell country"]/a')]
    legal_name += [i.get_attribute('innerHTML') for i in driver.find_elements_by_xpath('//*[@class="table-cell legal-name"]/a')]
    lei += [i.get_attribute('innerHTML') for i in driver.find_elements_by_xpath('//*[@class="table-cell lei"]/a')]

Logging in (Change this with the respective elements.)

driver.find_element_by_id("UserName").send_keys("xxxx")
driver.find_element_by_name("Password").send_keys("yyyy")
driver.find_element_by_class("loginButton").click()

Get page content

print(driver.page_source)

Sign up to request clarification or add additional context in comments.

6 Comments

THANK YOU very much it works :) Just have 2 more questions: how can I set my username and password (firefox is asking me authentification each time). Furthermore, how can I just display the content of the result (like request.content?)
@Annis15 Edited the answer to include your 2 questions.
Thanks again for the content page. However, when I try to add your codes find_element_by_id, nothing happen (with my username and pwd) . I have tried to change the "UserName" by "Username:" (how firefox popup display it) but nothing. However, the first question is solved, right now it's just trying to optimize the script.
@Annis15 Could you please provide the page URL your'e trying to scrape, so I can provide you the exact code for the login.
The page is the same that you scraped: driver.get('search.gleif.org/#/search/…)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.