8

I have a query that is basically a select *. In development this table is only 30000 rows, but in production it will much bigger. So I want to consume this query lazily. Why is the query below not lazy? I am using Postgres 9.5.4.1.

(do
  (def pg-uri {:connection-uri "jdbc:postgresql://localhost/..."})
  (def row (atom 0))
  (take 10 (clojure.java.jdbc/query 
          pg-uri
          ["select * from mytable"]
          {:fetch-size 10
           :auto-commit false
           :row-fn (fn [r] (swap! row inc))}))
  @row) ;;=> 300000
1

3 Answers 3

10

First, see https://jdbc.postgresql.org/documentation/83/query.html#query-with-cursor.

Solved it like this.

(jdbc/with-db-transaction [tx connection]
  (jdbc/query tx
    [(jdbc/prepare-statement (:connection tx)
                              "select * from mytable"
                              {:fetch-size 10})]
     {:result-set-fn (fn [result-set] ...)}))

where :result-set-fn is a function that consumes the lazy result set.

with-db-transaction takes care of autoCommit set to false. :fetch-size is not passed from query so you have to make a prepare-statement yourself.

Sign up to request clarification or add additional context in comments.

Comments

8

clojure.java.jdbc supports lazy processing of large result sets natively these days (the other answers here predate that native support). See the community documentation about it here:

http://clojure-doc.org/articles/ecosystem/java_jdbc/using_sql#processing-a-result-set-lazily

In particular, see the Additional Options? section for database-specific tweaks you might need. You can specify :auto-commit? false on any function that would open a new connection, and you can specify :fetch-size and the various cursor controls on any query-related function. See this StackOverflow question & answer for details of what PostgreSQL might need:

Java JDBC ignores setFetchSize?

Currently, you'll have to dig in the clojure.java.jdbc source or the prepare-statement reference documentation for more of those options. I'm continuing to work on the community documentation to surface all of that information.

Comments

0

You don't need the transaction and prepared-statement stuff above. It is the use of :result-set-fn that causes the lazy-sequence to be consumed. You may have meant to use :row-fn instead.

For full details please see The Clojure Cookbook. A print version is also available, which I recommend.

The jdbc/query function has several optional keyword parameters that control how it constructs the returned result set. The :result-set-fn parameter specifies a function that is applied to the entire result set (a lazy sequence) before it is returned. The default argument is the doall function:

(defn hi-lo [rs] [(first rs) (last rs)])

;; Find the highest- and lowest-cost fruits
(jdbc/query db-spec
            ["select * from fruit order by cost desc"]
            :result-set-fn hi-lo)
;; -> [{:grade nil, :unit nil, :cost 77, :appearance nil, :name "Kumquat"}
;;     {:grade 1.4, :unit nil, :cost 10, :appearance "rotten", :name "Tomato"}]

The :row-fn parameter specifies a function that is applied to each result row as the result is constructed. The default argument is the identity function:

(defn add-tax [row] (assoc row :tax (* 0.08 (row :cost))))

(jdbc/query db-spec
             ["select name,cost from fruit where cost = 12"]
             :row-fn add-tax)
;; -> ({:tax 0.96, :cost 12, :name "Plum"} {:tax 0.96, :cost 12, :name "Fig"})

5 Comments

Unfortunately this didn't work with Postgres. See the link in my answer. You need to set autoCommit to false and the fetchSize in order to make fetching with a cursor work or else the whole result set will still be in memory. Verify by turning on Postgres logs: LOG: execute fetch from S_1/C_2: select * from ... The word 'fetch' here is what you should see with using a cursor. Another check you can do is 'select all' on a big table and record the time vs. a limited query. You'll see that the limited query is much faster, but with fetching it doesn't matter.
You'll see that the limited query is much faster, but with using a cursor it doesn't matter => that is, if you only realize for example the first element.
So perhaps is it only lazy on the Clojure side, not on the JDBC/Postgres side then... I haven't experimented that far. You may wish to send this question to the clojure Google Group, as the maintainer of clojure.java.jdbc is frequently there.
Exactly. I already found the answer (stackoverflow.com/a/39775018/6264) . There's an open issue for :fetch-size not being passed down to prepare-statement in clojure.java.jdbc.
I've added an answer here that talks about setting :auto-commit? and :fetch-size and the new (0.7.0+) reducible-query function as the best way to deal with this, with links to the recently updated community documentation about clojure.java.jdbc.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.