Tuning Postgresql performance and memory use in a python workflow

Question

I use Postgresql 9.4 for a model database. My table looks somewhat like this:

CREATE TABLE table1 (
sid INTEGER PRIMARY KEY NOT NULL DEFAULT nextval('table1_sid_seq'::regclass),
col1 INT, 
col2 INT,
col3 JSONB);

My Python 2.7 workflow often looks like this:

curs.execute("SELECT sid, col1, col2 FROM table1")
data = curs.fetchall()
putback = []
for i in data: 
    result = do_something(i[1], i[2])
    putback.append((sid, result))
del data
curs.execute("UPDATE table1
              SET col3 = p.result
              FROM unnest(%s) p(sid INT, result JSONB)
              WHERE sid = p.sid", (putback,))

This typically works quite well and efficiently. However, for large queries Postgresql memory use will sometimes go through the roof (>50GB) during the UPDATE command and I believe it is being killed by OS X, because I get the WARNING: terminating connection because of crash of another server process. My Macbook Pro has 16GB of RAM and the query in question has 11M lines with each about 100 charactes of data to write back.

My postgresql.conf:

default_statistics_target = 50
maintenance_work_mem = 512MB 
constraint_exclusion = on 
checkpoint_completion_target = 0.9
effective_cache_size = 4GB 
work_mem = 256MB 
wal_buffers = 16MB 
checkpoint_segments = 128 
shared_buffers = 1024MB 
max_connections = 80

So I wonder

Why is my query consuming sometimes excessive amounts of RAM?
How can I control memory use and still guarantee good performance?
Is there a good guideline or tool for tuning Postgresql?

Update:
I am pretty sure that @wildplasser pinpointed my problem. In the comments he suggests to dump the data into the database first, and unpack it from there. Unfortunately I could not figure out how to implement his proposal. If anyone has an idea how to do that, their answer will be gladly accepted.

1) your work_mem is (rather) high , and you (probably) have no table structure. 2) design your database 3) see 2 BTW: your select query fetches all the rows (and I don't understand your update query.) — wildplasser
– wildplasser, Commented Dec 25, 2015 at 15:23
Yes. Your table might look like a spreadsheet, I don't know. Yes. — wildplasser
– wildplasser, Commented Dec 25, 2015 at 15:27
BTW: I don;t understand a word about you python stuff, but it looks like you are sucking the entire db-table into a python set or array, and use that (in exploded form) to update the same table. — wildplasser
– wildplasser, Commented Dec 25, 2015 at 15:33
I would advise to first save "putback" into a temp table (plus: add a PK to this temp) , and do the update from there. (as I understand it, the way it works now is: the "unnest" first builds a huge array, and then unpacks it; all in-memory) — wildplasser
– wildplasser, Commented Dec 25, 2015 at 15:50
No, you should not construct the temp table by exploding a huge array (which is very hard for the parser if I understand your ORM correctly) . Instead explode the "array" locally, and copy it to a remote table (take care not to operate on a tuple-at-atime basis, which would take a lot of time (but would not fail) — wildplasser
– wildplasser, Commented Dec 25, 2015 at 22:32

Community · Accepted Answer · 2017-05-23 10:27:22Z

1

My workaround is to slice putback with a simple function as proposed here:

def chunk(l, n):
    n = max(1, n)
    return [l[i:i + n] for i in range(0, len(l), n)]

and then

for chunk in chunk(putback, 250000):
    curs.execute("UPDATE table1
                  SET col3 = p.result
                  FROM unnest(%s) p(sid INT, result JSONB)
                  WHERE sid = p.sid", (chunk,))

This works, i.e. keeps the memory footprint in check, but is not very elegant and slower than dumping all data at once, as I usually do.

edited May 23, 2017 at 10:27

CommunityBot

11 silver badge

answered Dec 26, 2015 at 13:16

n1000

5,36411 gold badges43 silver badges67 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Tuning Postgresql performance and memory use in a python workflow

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related