0

There are two schema in the same database - oatarchival and oat The schemas are completely similar to each other.

Here is the query that I am running, which is taking lot of time

DELETE FROM oat.oat_user_tag_verification  
  using oatarchival.oat_user_tag_verification outv, oat.fp_archived f  
  WHERE outv.tag_id = f.tag_id and f.is_archived=false  
  and oat_user_tag_verification.user_id = outv.user_id and   
    oat_user_tag_verification.tag_id = outv.tag_id and   
    oat_user_tag_verification.verification_status = outv.verification_status  
    and oat_user_tag_verification.created_at=outv.created_at   
    and oat_user_tag_verification.updated_at=outv.updated_at

Here is the explain verbose out of this query -

"Delete on oat.oat_user_tag_verification  (cost=14989031.30..16227081.67 rows=1 width=18)"
"  ->  Nested Loop  (cost=14989031.30..16227081.67 rows=1 width=18)"
"        Output: oat_user_tag_verification.ctid, outv.ctid, f.ctid"
"        Join Filter: (outv.tag_id = f.tag_id)"
"        ->  Merge Join  (cost=14989031.30..16021422.32 rows=1 width=28)"
"              Output: oat_user_tag_verification.ctid, oat_user_tag_verification.tag_id, outv.ctid, outv.tag_id"
"              Merge Cond: ((oat_user_tag_verification.tag_id = outv.tag_id) AND (oat_user_tag_verification.user_id = outv.user_id) AND (oat_user_tag_verification.verification_status = outv.verification_status) AND (oat_user_tag_verification.created_at = ou (...)"
"              ->  Sort  (cost=13223314.06..13368102.38 rows=57915328 width=38)"
"                    Output: oat_user_tag_verification.ctid, oat_user_tag_verification.user_id, oat_user_tag_verification.tag_id, oat_user_tag_verification.verification_status, oat_user_tag_verification.created_at, oat_user_tag_verification.updated_at"
"                    Sort Key: oat_user_tag_verification.tag_id, oat_user_tag_verification.user_id, oat_user_tag_verification.verification_status, oat_user_tag_verification.created_at, oat_user_tag_verification.updated_at"
"                    ->  Seq Scan on oat.oat_user_tag_verification  (cost=0.00..1005001.28 rows=57915328 width=38)"
"                          Output: oat_user_tag_verification.ctid, oat_user_tag_verification.user_id, oat_user_tag_verification.tag_id, oat_user_tag_verification.verification_status, oat_user_tag_verification.created_at, oat_user_tag_verification.updated_at"
"              ->  Materialize  (cost=1765717.25..1812477.56 rows=9352062 width=38)"
"                    Output: outv.ctid, outv.tag_id, outv.user_id, outv.verification_status, outv.created_at, outv.updated_at"
"                    ->  Sort  (cost=1765717.25..1789097.40 rows=9352062 width=38)"
"                          Output: outv.ctid, outv.tag_id, outv.user_id, outv.verification_status, outv.created_at, outv.updated_at"
"                          Sort Key: outv.tag_id, outv.user_id, outv.verification_status, outv.created_at, outv.updated_at"
"                          ->  Seq Scan on oatarchival.oat_user_tag_verification outv  (cost=0.00..171454.62 rows=9352062 width=38)"
"                                Output: outv.ctid, outv.tag_id, outv.user_id, outv.verification_status, outv.created_at, outv.updated_at"
"        ->  Seq Scan on oat.fp_archived f  (cost=0.00..191863.83 rows=1103642 width=14)"
"              Output: f.ctid, f.tag_id"
"              Filter: (NOT f.is_archived)"

Here is the create table structure of all tables involved:

Table fp_archived:

CREATE TABLE fp_archived
(
  tag_id bigint NOT NULL,
  detection_url text,
  image_id bigint NOT NULL,
  pixel_x smallint NOT NULL,
  camera_num smallint NOT NULL,
  pixel_y smallint NOT NULL,
  width smallint NOT NULL,
  height smallint NOT NULL,
  is_archived boolean DEFAULT false,
  id bigint NOT NULL DEFAULT nextval('fp_archived_seq'::regclass),
  drive_id character varying(255),
  CONSTRAINT fp_archived_pkey PRIMARY KEY (id)
)

Table oat_user_tag_verification:

CREATE TABLE oatarchival.oat_user_tag_verification
(
  user_id integer NOT NULL,
  tag_id bigint NOT NULL,
  verification_status integer NOT NULL,
  created_at timestamp without time zone NOT NULL DEFAULT now(),
  updated_at timestamp without time zone DEFAULT now(),
  CONSTRAINT oat_user_tag_verification_pkey PRIMARY KEY (user_id, tag_id, verification_status, created_at),
  CONSTRAINT oat_user_tag_verification_tag_id_fkey FOREIGN KEY (tag_id)
      REFERENCES oatarchival.oat_tags (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT oat_user_tag_verification_user_id_fkey FOREIGN KEY (user_id)
      REFERENCES oatarchival.oat_users (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT oat_user_tag_verification_verification_status_fkey FOREIGN KEY (verification_status)
      REFERENCES oatarchival.oat_tag_verification_status (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION
)

The delete query runs for hours and hours. How can I optimize it? What indexes should I be created for this query to become faster?

9
  • You should CREATE INDEX for the field involve Commented Dec 4, 2016 at 6:55
  • Can you guide which index should I create? Commented Dec 4, 2016 at 7:04
  • sorry wrong link. check here for some tips MySQL index TIPS Commented Dec 4, 2016 at 7:14
  • This is postgres, does that matter though? Commented Dec 4, 2016 at 7:31
  • not really, tips are the same Commented Dec 4, 2016 at 7:35

2 Answers 2

1

Based on your EXPLAIN output (unfortunately you didn't run EXPLAIN (ANALYZE)) I'd suggest the following indexes:

CREATE INDEX ON oatarchival.oat_user_tag_verification(
   ctid,
   tag_id,
   user_id,
   verification_status,
   created_at,
   updated_at
);

CREATE INDEX ON oat.oat_user_tag_verification(
   tag_id,
   user_id,
   verification_status,
   created_at,
   updated_at
);

These can help with the merge join.

Then I'd create the following index:

CREATE INDEX ON oat.fp_archived(tag_id);

This will speed up the nested loop join.

Not sure if that is the best way to run the query, but it's a starting point.

Sign up to request clarification or add additional context in comments.

4 Comments

Please help me figure what can I improve about this question to help lift my question ban?
What do you mean by "question ban"?
Banned from asking questions on stackoverflow :(
No idea - maybe you should ask Stackoverflow.
1

One hint out of bad experience - try to fiddle with work_mem setting for the session. I had similar problem with incredible costs of queries on new PostgreSQL 9.6 and fount that it simply needs higher limit of work_mem.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.