I am reading the code of someone who built a domain crawler method using ruby. I am new to the concept of recursion and can not wrap my head on how to read their code.
Code:
def crawl_domain(url, page_limit = 100)
return if @already_visited.size == page_limit # [1]
url_object = open_url(url)
return if url_object == nil # [2]
parsed_url = parse_url(url_object)
return if parsed_url == nil # [3]
@already_visited[url]=true if @already_visited[url] == nil
page_urls = find_urls_on_page(parsed_url, url)
page_urls.each do |page_url|
if urls_on_same_domain?(url, page_url) and @already_visited[page_url] == nil
crawl_domain(page_url)
end
end
end
Questions:
- What do the combination of consecutive
returnstatements mean? - At line [1], if @already_visited's size is NOT the same as page_limit does the program break out of the crawl_domain method and skips the rest of the code?
- if @already_visited's size is the same as page_limit, does it move on to the next return statement after it sets
url_object = open_url(url)
Thanks for any help in advance!
Source: http://www.skorks.com/2009/07/how-to-write-a-web-crawler-in-ruby/