1

I'm trying to recursively crawl through all the available links in a page, and if it validates a working one, pull all of the links from that page and add it to the list to be crawled once the current page is finished. However i think I've hit a problem with using Conj on my sequence of links.

When I run my code it only appears to do the initial list of links i feed in when i first call the function.

(defn process-links
[links]
(if (not (empty? links))
  (do
    (if (not (is-working (first links)))
      (println (str (first links) " is not working"))
      (conj (get-links (first links)) links))
    (recur (rest links)))))

I'm not quite sure why its not adding the additional items to the list. Can anyone suggest why its doing this?

2 Answers 2

3

Clojure's data sturctures are immutable. You're not doing anything with the data structure returned from:

(conj (get-links (first links)) links)

A few additional things:

  • the above appends the current seq of links as an element to whatever is returned by get-link; that's probably not what you want to do.
  • this might be a good time to learn how to work with and/or generate lazy sequences.
  • watch out for cycles.
Sign up to request clarification or add additional context in comments.

1 Comment

Ok I'll look into the details of working with Lazy Sequences and Cycles. Thanks.
2

This looks like a fun opportunity to use tree-seq: create a tree on URLs, where each URL's "children" are determined by slurping its text and looking for more links. Then aside from the cycle problem Alex alludes to, you can just walk over the seq of links like any other sequence.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.