I'm having trouble with slow delete queries. I have a schema ,say "target" containing tables that all have an equivalent table (identical columns & primary keys) in another one, say "delta". I now want to delete all rows that appear in the delta schema from the target schema. I have tried this using the DELETE FROM WHERE EXISTS approach, but that seems incredibly slow. Here's an example query:
DELETE FROM "target".name2phoneme
WHERE EXISTS(
SELECT 1 FROM delta.name2phoneme d
WHERE name2phoneme.NAME_ID = d.NAME_ID
AND name2phoneme.PHONEME_ID = d.PHONEME_ID
);
This is the layout of both tables (whith the exception that the "delta" schema only has primary keys and no foreign keys)
CREATE TABLE name2phoneme
(
name_id uuid NOT NULL,
phoneme_id uuid NOT NULL,
seq_num numeric(3,0),
CONSTRAINT pk_name2phoneme PRIMARY KEY (name_id, phoneme_id),
CONSTRAINT fk_name2phoneme_name_id_2_name FOREIGN KEY (name_id)
REFERENCES name (name_id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION
DEFERRABLE INITIALLY DEFERRED,
CONSTRAINT fk_name2phoneme_phoneme_id_2_phoneme FOREIGN KEY (phoneme_id)
REFERENCES phoneme (phoneme_id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION
DEFERRABLE INITIALLY DEFERRED
)
The "target" table originally contains a little over 18M rows, while the delta table contains about 3.7M rows (that are to be deleted from the target).
Here's the output of EXPLAIN of the above query:
"Delete on name2phoneme (cost=154858.03..1068580.46 rows=6449114 width=12)"
" -> Hash Join (cost=154858.03..1068580.46 rows=6449114 width=12)"
" Hash Cond: ((name2phoneme.name_id = d.name_id) AND (name2phoneme.phoneme_id = d.phoneme_id))"
" -> Seq Scan on name2phoneme (cost=0.00..331148.16 rows=18062616 width=38)"
" -> Hash (cost=69000.01..69000.01 rows=3763601 width=38)"
" -> Seq Scan on name2phoneme d (cost=0.00..69000.01 rows=3763601 width=38)"
I tried to EXPLAIN ANALYZE the above query, but execution took over 2hrs so I killed it.
Any ideas on how I can optimize this operation?
WHERE (name_id, phoneme_id) IN (SELECT name_id, phoneme_id FROM other_table)and adding an index on(name_id, phoneme_id), but given the size of the tables I wouldn’t expect anything blazing fast...explain DELETE FROM "target".name2phoneme WHERE (name_id, phoneme_id) in (SELECT d.name_id, d.phoneme_id FROM "delta".name2phoneme d);, which resulted in the same costs as the WHERE EXISTS approach.