How can I make this delete query being run on postgres faster?

Question

There are two schema in the same database - oatarchival and oat The schemas are completely similar to each other.

Here is the query that I am running, which is taking lot of time

DELETE FROM oat.oat_user_tag_verification  
  using oatarchival.oat_user_tag_verification outv, oat.fp_archived f  
  WHERE outv.tag_id = f.tag_id and f.is_archived=false  
  and oat_user_tag_verification.user_id = outv.user_id and   
    oat_user_tag_verification.tag_id = outv.tag_id and   
    oat_user_tag_verification.verification_status = outv.verification_status  
    and oat_user_tag_verification.created_at=outv.created_at   
    and oat_user_tag_verification.updated_at=outv.updated_at

Here is the explain verbose out of this query -

"Delete on oat.oat_user_tag_verification  (cost=14989031.30..16227081.67 rows=1 width=18)"
"  ->  Nested Loop  (cost=14989031.30..16227081.67 rows=1 width=18)"
"        Output: oat_user_tag_verification.ctid, outv.ctid, f.ctid"
"        Join Filter: (outv.tag_id = f.tag_id)"
"        ->  Merge Join  (cost=14989031.30..16021422.32 rows=1 width=28)"
"              Output: oat_user_tag_verification.ctid, oat_user_tag_verification.tag_id, outv.ctid, outv.tag_id"
"              Merge Cond: ((oat_user_tag_verification.tag_id = outv.tag_id) AND (oat_user_tag_verification.user_id = outv.user_id) AND (oat_user_tag_verification.verification_status = outv.verification_status) AND (oat_user_tag_verification.created_at = ou (...)"
"              ->  Sort  (cost=13223314.06..13368102.38 rows=57915328 width=38)"
"                    Output: oat_user_tag_verification.ctid, oat_user_tag_verification.user_id, oat_user_tag_verification.tag_id, oat_user_tag_verification.verification_status, oat_user_tag_verification.created_at, oat_user_tag_verification.updated_at"
"                    Sort Key: oat_user_tag_verification.tag_id, oat_user_tag_verification.user_id, oat_user_tag_verification.verification_status, oat_user_tag_verification.created_at, oat_user_tag_verification.updated_at"
"                    ->  Seq Scan on oat.oat_user_tag_verification  (cost=0.00..1005001.28 rows=57915328 width=38)"
"                          Output: oat_user_tag_verification.ctid, oat_user_tag_verification.user_id, oat_user_tag_verification.tag_id, oat_user_tag_verification.verification_status, oat_user_tag_verification.created_at, oat_user_tag_verification.updated_at"
"              ->  Materialize  (cost=1765717.25..1812477.56 rows=9352062 width=38)"
"                    Output: outv.ctid, outv.tag_id, outv.user_id, outv.verification_status, outv.created_at, outv.updated_at"
"                    ->  Sort  (cost=1765717.25..1789097.40 rows=9352062 width=38)"
"                          Output: outv.ctid, outv.tag_id, outv.user_id, outv.verification_status, outv.created_at, outv.updated_at"
"                          Sort Key: outv.tag_id, outv.user_id, outv.verification_status, outv.created_at, outv.updated_at"
"                          ->  Seq Scan on oatarchival.oat_user_tag_verification outv  (cost=0.00..171454.62 rows=9352062 width=38)"
"                                Output: outv.ctid, outv.tag_id, outv.user_id, outv.verification_status, outv.created_at, outv.updated_at"
"        ->  Seq Scan on oat.fp_archived f  (cost=0.00..191863.83 rows=1103642 width=14)"
"              Output: f.ctid, f.tag_id"
"              Filter: (NOT f.is_archived)"

Here is the create table structure of all tables involved:

Table fp_archived:

CREATE TABLE fp_archived
(
  tag_id bigint NOT NULL,
  detection_url text,
  image_id bigint NOT NULL,
  pixel_x smallint NOT NULL,
  camera_num smallint NOT NULL,
  pixel_y smallint NOT NULL,
  width smallint NOT NULL,
  height smallint NOT NULL,
  is_archived boolean DEFAULT false,
  id bigint NOT NULL DEFAULT nextval('fp_archived_seq'::regclass),
  drive_id character varying(255),
  CONSTRAINT fp_archived_pkey PRIMARY KEY (id)
)

Table oat_user_tag_verification:

CREATE TABLE oatarchival.oat_user_tag_verification
(
  user_id integer NOT NULL,
  tag_id bigint NOT NULL,
  verification_status integer NOT NULL,
  created_at timestamp without time zone NOT NULL DEFAULT now(),
  updated_at timestamp without time zone DEFAULT now(),
  CONSTRAINT oat_user_tag_verification_pkey PRIMARY KEY (user_id, tag_id, verification_status, created_at),
  CONSTRAINT oat_user_tag_verification_tag_id_fkey FOREIGN KEY (tag_id)
      REFERENCES oatarchival.oat_tags (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT oat_user_tag_verification_user_id_fkey FOREIGN KEY (user_id)
      REFERENCES oatarchival.oat_users (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT oat_user_tag_verification_verification_status_fkey FOREIGN KEY (verification_status)
      REFERENCES oatarchival.oat_tag_verification_status (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION
)

The delete query runs for hours and hours. How can I optimize it? What indexes should I be created for this query to become faster?

sorry wrong link. check here for some tips MySQL index TIPS — Juan Carlos Oropeza
– Juan Carlos Oropeza, Commented Dec 4, 2016 at 7:14

Laurenz Albe · Accepted Answer · 2016-12-04 20:38:26Z

1

Based on your EXPLAIN output (unfortunately you didn't run EXPLAIN (ANALYZE)) I'd suggest the following indexes:

CREATE INDEX ON oatarchival.oat_user_tag_verification(
   ctid,
   tag_id,
   user_id,
   verification_status,
   created_at,
   updated_at
);

CREATE INDEX ON oat.oat_user_tag_verification(
   tag_id,
   user_id,
   verification_status,
   created_at,
   updated_at
);

These can help with the merge join.

Then I'd create the following index:

CREATE INDEX ON oat.fp_archived(tag_id);

This will speed up the nested loop join.

Not sure if that is the best way to run the query, but it's a starting point.

answered Dec 4, 2016 at 20:38

Laurenz Albe

257k22 gold badges312 silver badges388 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Tisha Over a year ago

Please help me figure what can I improve about this question to help lift my question ban?

Laurenz Albe Over a year ago

What do you mean by "question ban"?

Tisha Over a year ago

Banned from asking questions on stackoverflow :(

Laurenz Albe Over a year ago

No idea - maybe you should ask Stackoverflow.

JosMac · Accepted Answer · 2016-12-05 13:13:46Z

1

One hint out of bad experience - try to fiddle with work_mem setting for the session. I had similar problem with incredible costs of queries on new PostgreSQL 9.6 and fount that it simply needs higher limit of work_mem.

answered Dec 5, 2016 at 13:13

JosMac

2,3601 gold badge22 silver badges25 bronze badges

Collectives™ on Stack Overflow

How can I make this delete query being run on postgres faster?

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related