0

I am so frustrated trying to use Ruby to fetch a specific url content.

I've tried many different ways like open-uri, standard request none worked so far. I always get empty html. I also tried to use python to fetch the same url which always returned the correct html content. I am really not sure why... Please help as I am newbiew to both Ruby and Python... I want to use Ruby (prefer the tidy syntax and human friendly function names, easier to install libs using gem and homebrew (on mac) than python easy_install) but I am now considering Python because it just works (yet still trying to get my head around 2.x and 3.x issue). I may be doing something really stupid but I think is very unlikely.

ruby 1.9.2p136 (2010-12-25 revision 30365) [i386-darwin10.6.0]

Implementation 1:

url = URI.parse('http//:www.stackoverflow.com/') req = Net::HTTP::Get.new(url.path)
res = Net::HTTP.start(url.host, url.port) {|http|   http.request(req) }    
puts res.body #empty

Implementation 2:

doc = Nokogiri::HTML(open("http//:www.stackoverflow.com/", "User-Agent" => "Safari"))
#empty
#I tried to use without user agent, without Nokogiri none worked.

Python Implementation which worked every time perfectly

f = urllib.urlopen("http//:www.stackoverflow.com/")
# Read from the object, storing the page's contents in 's'.
s = f.read()
f.close()

print s
6
  • maybe you need to follow a redirect? Commented Jan 31, 2011 at 21:28
  • 2
    "http:www.url.com" is probably an example, ok, but what happened to the "//" part? anyway, you should post the real URL you are trying to download or there is nothing to test, only to guess. Commented Jan 31, 2011 at 21:33
  • 1
    It's interesting you say your Python works. I get a error saying there's an http error, "no host given". Commented Jan 31, 2011 at 22:06
  • for example www.yellowpages.com.au/search/listings?clue=plumber&locationClue=Australia Commented Jan 31, 2011 at 22:34
  • Thanks very much for the responses I tested with the code given in below answers none worked so far. With above python code if you update the URL to yellow pages it will show the actual html. Commented Jan 31, 2011 at 22:40

2 Answers 2

5

If that is your exact code it is invalid for several reasons.

  1. http: should be http://
  2. URL needs a path. if you want the root page of example.com it needs to be http://example.com/ the trailing slash is significant.
  3. if you put 2 lines of code on one line you need to use ; to denote the end of the first line

SO

require 'net/http'

url = URI.parse('http://www.yellowpages.com.au/search/listings?clue=plumber&locationClue=Australia')
req = Net::HTTP::Get.new(url.path)
res = Net::HTTP.start(url.host, url.port) {|http|   http.request(req) }    
puts res.body

Same is true with using open in nokogiri

EDIT: that site is returning bad results many times:

counter = 0

20.times do
  url = URI.parse('http://www.yellowpages.com.au/search/listings?clue=plumber&locationClue=Australia')
  req = Net::HTTP::Get.new(url.path)
  res = Net::HTTP.start(url.host, url.port) {|http|   http.request(req) }    
  sleep 1
  counter +=1 unless res.body.empty?
end

puts counter

for me this only returned once a non empty body. If you substitute in another site it works all the time

curl "http://www.yellowpages.com.au/search/listings?clue=plumber&locationClue=Australia"

Yields the same inconsistent results.

Sign up to request clarification or add additional context in comments.

7 Comments

I've given the url I was testing with in a comment above. I've tested your code again with empty result.
I am getting intermittent results with that site. I think it is returning empty body most of the time. Run that code a bunch of times and you will see. If I run it with yahoo.com it works every time.
What I am curious to know is that why when I ran the python code it returns correct html every single time. Where as in the case of ruby code most of the time it returns empty result. I am still trying to suss it out. Because if Ruby lib isn't "reliable" then I should consider use python for my particular case.
The ruby library is reliable, the site is not. I have no idea why it is running in python all the time. If I run CURL in my shell (nothing to do with Ruby) I get blank results half the time too. I do not think curl and Net::HTTP are broken I think the site is. Try running a similar example to mine in python (IE like a loop of 20 hits), I do not think you will be getting 100% results.
Is there a way to increase the change of getting a more consistent result in Ruby? like longer time out etc... I also think the site has issues as I've tested with a few other sites just then most of them worked as expected.
|
2

Two examples with openURI (standard lib), a wrapper for (among others) the rather cumbersome Net::HTTP :

require 'open-uri'

open("http://www.stackoverflow.com/"){|f| puts f.read}

puts URI::parse("http://www.google.com/").read

1 Comment

I've given the url I was testing with in a comment above. With the open() method I get an error, with URI::parse, I get empty result as I would normally get.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.