0

I have Python code that can parse data from a string variable containing HTML code.

I want code that gets the HTML from URL and then parses this data.

the working code (parsing HTML):

from bs4 import BeautifulSoup

data = '''\
<html>
  <head>
    <meta name="generator"
     content="HTML Tidy for HTML5 (experimental) for Windows https://github.com/w3c/tidy- 
      html5/tree/c63cc39" />
    <title></title>
   </head>
 <body>
<div class="Eqh F6l Jea k1A zI7 iyn Hsu">
  <div class="Shl zI7 iyn Hsu">
    <a data-test-id="search-guide" href="" title="Search for &quot;living room colors&quot;">
      <div class="Jea Lfz XiG fZz gjz qDf zI7 iyn Hsu" style="white-space: nowrap; background-color: 
         rgb(162, 152, 139);">
        <div class="tBJ dyH iFc MF7 erh tg7 IZT mWe">Living</div>
       </div>
      </a>
     </div>
    </div>
  </body>
 </html>
 '''
soup = BeautifulSoup(data, 'html.parser')
a = soup.select('div.Eqh.F6l.Jea.k1A.zI7.iyn.Hsu a')[0]
print(a['title'])

Here is what I have tried that does not work (getting HTML from URL and then parsing):

import requests
from bs4 import BeautifulSoup

vgm_url = 'https://www.pinterest.com/search/pins/?q=skin%20care'
html_text = requests.get(vgm_url).text
soup = BeautifulSoup(html_text, 'html.parser')
a = soup.select('div.Eqh.F6l.Jea.k1A.zI7.iyn.Hsu a')
for a in soup.select('div.Eqh.F6l.Jea.k1A.zI7.iyn.Hsu a'):
    print(a['title'])

I'm not getting any error, it does not print anything. I appreciate your help.

1
  • Are you really sure that html_text has the text that you want? That is, it contains the contents you want instead of, say, a login page? Commented Nov 26, 2020 at 10:50

1 Answer 1

1

Then in the debugging process use print(html_text) to see what you are getting ;).

When you print it you see that it is different from the page source (see it in Chrome or other webbrowser and go to the url). You can also see that the page is loading for a bit when you go to it in a browser.

Therefore you need to wait for it to load with something like Selenium.

To demonstrate a bit of Selenium, I loaded your page and clicked something with a defined class that loaded after a while:

# you will have to install (Chrome), or another browser driver
from selenium.webdriver import Chrome

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = Chrome(r'C:\Program Files\chromedriver.exe')  # I have (Chrome) installed here

driver.get("https://www.pinterest.com/search/pins/?q=skin%20care")
feeling_lucky_btn = WebDriverWait(driver, 3).until(  # waiting for loading
    EC.presence_of_element_located(
    (By.CLASS_NAME, 'GrowthUnauthPinImage__Image')))  # identifiing element by class name
feeling_lucky_btn.click()
Sign up to request clarification or add additional context in comments.

12 Comments

thanks for the response, ya, but I want the proper result, the result that the code does, just print out long HTML codes won't solve my problem, unless you have a hint on how to use it.
If the html_text is the same as data (your example), and your example works, then what you tried has to work as well, right?
thanks for the response, I was looking at the print result, long HTML code, there was not the code that supposes to be, I am confused now.
@Brambor Is this even possible with requests? I think he needs to use selenium? Isn't it? I see his soup has just 1 main div in it. Nothing else.
@Dave99 I added demo for Selenium to my answer ;).
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.