0

I try to parse some html content using ruby. I use the following code:

require 'open-uri'


url = 'http://www.fooducate.com/appo#!page=browse&nav=0'
html = open(url)
IO.copy_stream(html,'test.html')

But what i have get is only the content div without content:

<div id="page-content" class="content group">
</div>

Is that a bug in parser? How can i fix that problem ?

2
  • Note that you're not actually using a html parser. You're just downloading an IO stream via HTTP - IO.copy_stream does not care about the content. A parser would be something like Nokogiri which reads the HTML document and builds a representation of it so you can read or manipulate the document. Commented Mar 7, 2016 at 14:48
  • I have already tried to use Nokogiri to get content, but i get the same result. Commented Mar 7, 2016 at 14:51

1 Answer 1

1

If you look at the comment just above that div, you'll see the rest of the content is loaded via JavaScript. To retrieve it, you'd need to run the page's scripts like a browser would, or otherwise emulate the second fetch.

<!-- hook for any page content - JS Navigation object expects that -->
<div id="page-content" class="content group">
</div>

This behavior is visible when you load the page through your browser. Notice that the navigation and layout load, but you see a "Loading" message for a few seconds before the content fills in.

Sign up to request clarification or add additional context in comments.

2 Comments

What do you recommend to have the browser result ? or How can i do a second fetch ?
You could try something like ExecJS to run the JavaScript. Or it looks like the data comes from fooducate.com/internal/chef_client_proxy/get_browse_tree if you want to start with JSON instead.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.