6

I use Postgresql 9.4 for a model database. My table looks somewhat like this:

CREATE TABLE table1 (
sid INTEGER PRIMARY KEY NOT NULL DEFAULT nextval('table1_sid_seq'::regclass),
col1 INT, 
col2 INT,
col3 JSONB);

My Python 2.7 workflow often looks like this:

curs.execute("SELECT sid, col1, col2 FROM table1")
data = curs.fetchall()
putback = []
for i in data: 
    result = do_something(i[1], i[2])
    putback.append((sid, result))
del data
curs.execute("UPDATE table1
              SET col3 = p.result
              FROM unnest(%s) p(sid INT, result JSONB)
              WHERE sid = p.sid", (putback,))

This typically works quite well and efficiently. However, for large queries Postgresql memory use will sometimes go through the roof (>50GB) during the UPDATE command and I believe it is being killed by OS X, because I get the WARNING: terminating connection because of crash of another server process. My Macbook Pro has 16GB of RAM and the query in question has 11M lines with each about 100 charactes of data to write back.

My postgresql.conf:

default_statistics_target = 50
maintenance_work_mem = 512MB 
constraint_exclusion = on 
checkpoint_completion_target = 0.9
effective_cache_size = 4GB 
work_mem = 256MB 
wal_buffers = 16MB 
checkpoint_segments = 128 
shared_buffers = 1024MB 
max_connections = 80

So I wonder

  1. Why is my query consuming sometimes excessive amounts of RAM?
  2. How can I control memory use and still guarantee good performance?
  3. Is there a good guideline or tool for tuning Postgresql?

Update:
I am pretty sure that @wildplasser pinpointed my problem. In the comments he suggests to dump the data into the database first, and unpack it from there. Unfortunately I could not figure out how to implement his proposal. If anyone has an idea how to do that, their answer will be gladly accepted.

8
  • 1) your work_mem is (rather) high , and you (probably) have no table structure. 2) design your database 3) see 2 BTW: your select query fetches all the rows (and I don't understand your update query.) Commented Dec 25, 2015 at 15:23
  • Yes. Your table might look like a spreadsheet, I don't know. Yes. Commented Dec 25, 2015 at 15:27
  • BTW: I don;t understand a word about you python stuff, but it looks like you are sucking the entire db-table into a python set or array, and use that (in exploded form) to update the same table. Commented Dec 25, 2015 at 15:33
  • 1
    I would advise to first save "putback" into a temp table (plus: add a PK to this temp) , and do the update from there. (as I understand it, the way it works now is: the "unnest" first builds a huge array, and then unpacks it; all in-memory) Commented Dec 25, 2015 at 15:50
  • 1
    No, you should not construct the temp table by exploding a huge array (which is very hard for the parser if I understand your ORM correctly) . Instead explode the "array" locally, and copy it to a remote table (take care not to operate on a tuple-at-atime basis, which would take a lot of time (but would not fail) Commented Dec 25, 2015 at 22:32

1 Answer 1

1

My workaround is to slice putback with a simple function as proposed here:

def chunk(l, n):
    n = max(1, n)
    return [l[i:i + n] for i in range(0, len(l), n)]

and then

for chunk in chunk(putback, 250000):
    curs.execute("UPDATE table1
                  SET col3 = p.result
                  FROM unnest(%s) p(sid INT, result JSONB)
                  WHERE sid = p.sid", (chunk,))

This works, i.e. keeps the memory footprint in check, but is not very elegant and slower than dumping all data at once, as I usually do.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.