0

I am having a weird issue with Python and Selenium. I am accessing the URL https://www.biggerpockets.com/users/JarridJ1. When you click more it shows further content. I can understand that it is a React-based website. When I view it on browser and doa View Source I can see the required stuff in a react element <div data-react-class="Profile/Header/Header" data-react-props="{&quot. I tried to automate Firefox via Selenium but I could not even get with that as well. Check the screenshot:

enter image description here Below is the code I tried:

from time import sleep

from selenium import webdriver
from selenium.webdriver.chrome.options import Options


def parse(u):
    print('Processing... {}'.format(u))
    driver.get(u)
    sleep(2)
    html = driver.page_source
    driver.save_screenshot('bp.png')
    print(html)


if __name__ == '__main__':
    options = Options()
    options.add_argument("--headless")  # Runs Chrome in headless mode.
    options.add_argument('--no-sandbox')  # Bypass OS security model
    options.add_argument('--disable-gpu')  # applicable to windows os only
    options.add_argument('start-maximized')  #
    options.add_argument('disable-infobars')
    options.add_argument("--disable-extensions")
    driver = webdriver.Firefox()
    parse('https://www.biggerpockets.com/users/JarridJ1')
2
  • As per the code you are trying to fetch the page source right? So what is the issue that you are facing or what is the error that you are getting Commented Mar 25, 2020 at 9:54
  • @SameerArora If you do a page source view-source:https://www.biggerpockets.com/users/JarridJ1 within browser, you will find text like [email protected] but when you check html returned by Sleenium, it is not here. Commented Mar 25, 2020 at 10:10

1 Answer 1

1

This is a tricky one but I found a way to get to the element you have highlighted. Still not sure why driver.page_source is not return what you are looking for.

def parse(u):
    print('Processing... {}'.format(u))
    driver.get(u)
    sleep(2)
    get_everything = driver.find_elements_by_xpath("//*")
    for element in get_everything:
        print(element .get_attribute('innerHTML'))

    #html = driver.page_source
    #driver.save_screenshot('bp.png')
    #print(html)

Below is my standalone example:

from selenium import webdriver
import time


driver = webdriver.Chrome("C:\Path\To\chromedriver.exe")
driver.get("https://www.biggerpockets.com/users/JarridJ1")
time.sleep(5)
a = driver.find_element_by_xpath("//div[@data-react-class='Profile/Header/Header']")
b = a.get_attribute("data-react-props")
print(b)
c = driver.find_elements_by_xpath("//*")
for i in c:
    print(i.get_attribute('innerHTML'))
Sign up to request clarification or add additional context in comments.

4 Comments

I have updated the question. The required text is present though not visible. I just attached the screenshot. I did not click more for it.
@Volatil3 I think I understand better what you are looking for now. See the updated answer.
Wow, yes this works. Though it is not so relevant can I do similar without using Selenium and use requests and bs4 instead?. I mean a = driver.find_element_by_xpath("//div[@data-react-class='Profile/Header/Header']")?
I find that requests does not work well for React hosted sites.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.