4

Some friends and I have been working on a set of scripts that make it easier to do work on the machines at uni. One of these tools currently uses Nokogiri, but in order for these tools to run on all machines with as little setup as possible we've been trying to find a 'native' html parser, instead of requiring users to install RVM and custom gems (due to disk space limitations for most users).

Are we pretty much restricted to Nokogiri/Hpricot/? Should we look at just writing our own custom parser that fits our needs?

Cheers.

EDIT: If there's posts on here that I've missed in my searches, let me know! S.O. is sometimes just too large to find things effectively...

5
  • 1
    Given that the gems are all open source, you can always extract what you need from a gem and use it in a custom parser, then you only have to deliver your own code... Commented Feb 25, 2012 at 15:59
  • I'd sure recommend against writing your own. Commented Feb 25, 2012 at 16:03
  • It will be much more reliable to use existing soulutions. And what @MarcTalbot said above is key: if a gem is open-source, you can just copy the source into your application (assuming that you do not require non-GPL libraries). Commented Feb 25, 2012 at 16:16
  • It may be duplicate Q: stackoverflow.com/questions/2554909/… Commented Feb 25, 2012 at 16:39
  • Yeah, our only problem is that the whole suite of tools is currently about 5MB, so to add all the libs for nokogiri (for example) bumps the package up to about 7MB. We were hoping there might be something small! No worries though, I'll take a look at using existing stuff packaged up. Commented Feb 25, 2012 at 18:17

1 Answer 1

2

There is no html parser in ruby stdlib
html parsers have to be more forgiving of bad markup than xml parsers

You could run the html though tidy (http://tidy.sourceforge.net)
to tidy up the html and produce valid markup
This can now be read via rexml :-) which is in stdlib

rexml is much slower than nokogiri, last checked in 2009
Sam Ruby had been working on making rexml faster though

A better way would be to have a better deployment
Take a look at http://gembundler.com/bundle_package.html and using capistrano (or some such) to provision servers

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks, the problem with deployment is that the tools get run on university-managed machines, so if we have to install anything it has to happen in the users home directory, which is limited to a certain amount of space: few people have enough room to install something like RVM with custom gems. This is also pure ruby, not Rails.
another option might be to create and consume an API. the advantage is that the code is deployed only on one machine - so space savings. but benchmark the speed of an api call
These aren't those sorts of tools - it's command line utilities that do things like wrap up lpr into an easy to use tool. Thanks though.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.