1

For a project I've to scrap datas from a different website, and I'm having problem with one.

When I look at the source code the things I want are in a table, so it seems to be easy to scrap. But when I run my script that part of the code source doesn't show.

Here is my code. I tried different things. At first there wasn't any headers, then I added some but no difference.

# import libraries
import urllib2
from bs4 import BeautifulSoup
import csv  
import requests

# specify the url 
quote_page = 'http://www.airpl.org/Pollens/pollinariums-sentinelles'

# query the website and return the html to the variable 'page'
response = requests.get(quote_page)  
response.addheaders = [('User-agent', 'Mozilla/5.0')]
print(response.text)

# parse the html using beautiful soap and store in variable `response`
soup = BeautifulSoup(response.text, 'html.parser')  

with open('allergene.txt', 'w') as f:
    f.write(soup.encode('UTF-8', 'ignore'))

What I'm looking for in the website is the things after "Herbacée" whose HTML Look like :

<p class="level1">

      <img src="/static/img/state-0.png" alt="pas d'émission" class="state">

    Herbacee
  </p>

Do you have any idea what's wrong ?

Thanks for your help and happy new year guys :)

4
  • it can use JavaScript to add data. BS and requests don't run JavaScript Commented Jan 2, 2017 at 15:37
  • BTW: you add headers after you receive data - response.addheaders - it makes no sense - you have to use it in get(..., headers=headers) Commented Jan 2, 2017 at 15:41
  • I tried to do as you said, but no difference. Maybe there is some javascript. Didn't thought about that. There is one line : <script type="text/javascript"> $(document).ready(function() { load_garden_state("/gardens/garden/1/state/"); init_calendar_link(); }); </script> Looks like it could be that. Any meaning to get those datas anyway ? Commented Jan 2, 2017 at 15:43
  • you can always turn off JavaScript in browser and try to open page. You will see what you can get without JavaScript. Commented Jan 2, 2017 at 15:45

1 Answer 1

1

This page use JavaScript to render the table, the real page contains the table is:

http://www.alertepollens.org/gardens/garden/1/state/

You can find this url in Chrome Dev tools>>>Network.

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

Not my habit to use those tools. Thanks for your quick answer, that's what I needed :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.