Parser fail to read html content using ruby

Question

I try to parse some html content using ruby. I use the following code:

require 'open-uri'


url = 'http://www.fooducate.com/appo#!page=browse&nav=0'
html = open(url)
IO.copy_stream(html,'test.html')

But what i have get is only the content div without content:

<div id="page-content" class="content group">
</div>

Is that a bug in parser? How can i fix that problem ?

Note that you're not actually using a html parser. You're just downloading an IO stream via HTTP - IO.copy_stream does not care about the content. A parser would be something like Nokogiri which reads the HTML document and builds a representation of it so you can read or manipulate the document. — max
– max, Commented Mar 7, 2016 at 14:48
I have already tried to use Nokogiri to get content, but i get the same result. — Ayoub Abid
– Ayoub Abid, Commented Mar 7, 2016 at 14:51

Kristján · Accepted Answer · 2016-03-07 14:44:02Z

1

If you look at the comment just above that div, you'll see the rest of the content is loaded via JavaScript. To retrieve it, you'd need to run the page's scripts like a browser would, or otherwise emulate the second fetch.

<!-- hook for any page content - JS Navigation object expects that -->
<div id="page-content" class="content group">
</div>

This behavior is visible when you load the page through your browser. Notice that the navigation and layout load, but you see a "Loading" message for a few seconds before the content fills in.

answered Mar 7, 2016 at 14:44

Kristján

19k5 gold badges55 silver badges65 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Ayoub Abid Over a year ago

What do you recommend to have the browser result ? or How can i do a second fetch ?

Kristján Over a year ago

You could try something like ExecJS to run the JavaScript. Or it looks like the data comes from fooducate.com/internal/chef_client_proxy/get_browse_tree if you want to start with JSON instead.

Collectives™ on Stack Overflow

Parser fail to read html content using ruby

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related