Python, How to parse HTML from URL?

Question

I have Python code that can parse data from a string variable containing HTML code.

I want code that gets the HTML from URL and then parses this data.

the working code (parsing HTML):

from bs4 import BeautifulSoup

data = '''\
<html>
  <head>
    <meta name="generator"
     content="HTML Tidy for HTML5 (experimental) for Windows https://github.com/w3c/tidy- 
      html5/tree/c63cc39" />
    <title></title>
   </head>
 <body>
<div class="Eqh F6l Jea k1A zI7 iyn Hsu">
  <div class="Shl zI7 iyn Hsu">
    <a data-test-id="search-guide" href="" title="Search for &quot;living room colors&quot;">
      <div class="Jea Lfz XiG fZz gjz qDf zI7 iyn Hsu" style="white-space: nowrap; background-color: 
         rgb(162, 152, 139);">
        <div class="tBJ dyH iFc MF7 erh tg7 IZT mWe">Living</div>
       </div>
      </a>
     </div>
    </div>
  </body>
 </html>
 '''
soup = BeautifulSoup(data, 'html.parser')
a = soup.select('div.Eqh.F6l.Jea.k1A.zI7.iyn.Hsu a')[0]
print(a['title'])

Here is what I have tried that does not work (getting HTML from URL and then parsing):

import requests
from bs4 import BeautifulSoup

vgm_url = 'https://www.pinterest.com/search/pins/?q=skin%20care'
html_text = requests.get(vgm_url).text
soup = BeautifulSoup(html_text, 'html.parser')
a = soup.select('div.Eqh.F6l.Jea.k1A.zI7.iyn.Hsu a')
for a in soup.select('div.Eqh.F6l.Jea.k1A.zI7.iyn.Hsu a'):
    print(a['title'])

I'm not getting any error, it does not print anything. I appreciate your help.

Are you really sure that html_text has the text that you want? That is, it contains the contents you want instead of, say, a login page? — pepoluan
– pepoluan, Commented Nov 26, 2020 at 10:50

Brambor · Accepted Answer · 2020-11-26 14:53:06Z

1

Then in the debugging process use print(html_text) to see what you are getting ;).

When you print it you see that it is different from the page source (see it in Chrome or other webbrowser and go to the url). You can also see that the page is loading for a bit when you go to it in a browser.

Therefore you need to wait for it to load with something like Selenium.

To demonstrate a bit of Selenium, I loaded your page and clicked something with a defined class that loaded after a while:

# you will have to install (Chrome), or another browser driver
from selenium.webdriver import Chrome

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = Chrome(r'C:\Program Files\chromedriver.exe')  # I have (Chrome) installed here

driver.get("https://www.pinterest.com/search/pins/?q=skin%20care")
feeling_lucky_btn = WebDriverWait(driver, 3).until(  # waiting for loading
    EC.presence_of_element_located(
    (By.CLASS_NAME, 'GrowthUnauthPinImage__Image')))  # identifiing element by class name
feeling_lucky_btn.click()

edited Nov 26, 2020 at 14:53

answered Nov 26, 2020 at 10:32

Brambor

6961 gold badge8 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

12 Comments

Dave99 Over a year ago

thanks for the response, ya, but I want the proper result, the result that the code does, just print out long HTML codes won't solve my problem, unless you have a hint on how to use it.

Brambor Over a year ago

If the html_text is the same as data (your example), and your example works, then what you tried has to work as well, right?

Dave99 Over a year ago

thanks for the response, I was looking at the print result, long HTML code, there was not the code that supposes to be, I am confused now.

Abhishek Rai Over a year ago

@Brambor Is this even possible with requests? I think he needs to use selenium? Isn't it? I see his soup has just 1 main div in it. Nothing else.

Brambor Over a year ago

@Dave99 I added demo for Selenium to my answer ;).

|

Collectives™ on Stack Overflow

Python, How to parse HTML from URL?

1 Answer 1

12 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

12 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related