1

I am using PostgreSQL for its array functionality. Here's my schema:

CREATE TABLE questions (
  id INTEGER PRIMARY KEY,
  product_id INTEGER UNIQUE NOT NULL,
  body VARCHAR(1000) NOT NULL,
  date_written DATE NOT NULL DEFAULT current_date,
  asker_name VARCHAR(60) NOT NULL,
  asker_email VARCHAR(60) NOT NULL,
  reported BOOLEAN DEFAULT FALSE,
  helpful INTEGER NOT NULL DEFAULT 0
);

CREATE TABLE answers (
  id PRIMARY KEY NOT NULL,
  question_id INTEGER NOT NULL,
  body VARCHAR(1000) NOT NULL,
  date_written DATE NOT NULL DEFAULT current_date,
  answerer_name VARCHAR(60) NOT NULL,
  answerer_email VARCHAR(60) NOT NULL,
  reported BOOLEAN DEFAULT FALSE,
  helpful INTEGER NOT NULL DEFAULT 0
);

CREATE TABLE photos (
  id INTEGER UNIQUE,
  answer_id INTEGER NOT NULL,
  photo VARCHAR(200)
);

I am trying to query my answers table to get a list of all the answers for a given question id and include an array of all photos that exist for that given answer_id. The results should be sorted in descending order of helpfulness. So far, I have a massive query that displays the results I'm looking for, but the execution time is 729.595 ms. I am trying to optimize to get the query's time down to 200 ms. I have the following indexes to try and optimize my query times:

  indexname    |                                 indexdef                                  
-----------------+---------------------------------------------------------------------------
 answer_id       | CREATE UNIQUE INDEX answer_id ON public.answers USING btree (id)
 question_id     | CREATE INDEX question_id ON public.answers USING btree (question_id)
 idx_reported_id | CREATE INDEX idx_reported_id ON public.answers USING btree (reported, id)
 answers_pkey    | CREATE UNIQUE INDEX answers_pkey ON public.answers USING btree (id)
  indexname    |                                  indexdef                                  
----------------+----------------------------------------------------------------------------
 id             | CREATE UNIQUE INDEX id ON public.questions USING btree (id)
 idx_q_reported | CREATE INDEX idx_q_reported ON public.questions USING btree (id, reported)
 questions_pkey | CREATE UNIQUE INDEX questions_pkey ON public.questions USING btree (id)
   indexname   |                              indexdef                               
---------------+---------------------------------------------------------------------
 photos_id_key | CREATE UNIQUE INDEX photos_id_key ON public.photos USING btree (id)
 p_links       | CREATE INDEX p_links ON public.photos USING btree (photo)

In my analysis, I noticed that the GroupAggregate is time-consuming: GroupAggregate (cost=126222.21..126222.71 rows=25 width=129) (actual time=729.497..729.506 rows=5 loops=1) Group Key: answers.id

Is there a way I can avoid the time-consuming GROUP BY? Am I missing something with the indexes? Here is the query itself:

SELECT answers.id, 
       question_id, 
       body, 
       date_written, 
       answerer_name, 
       answerer_email, 
       reported, 
       helpful, 
       ARRAY_AGG(photo) as photos 
FROM answers 
  LEFT JOIN photos ON answers.id = photos.answer_id 
WHERE reported IS 
false AND answers.id IN (SELECT id 
                         FROM answers 
                         WHERE question_id = 20012) 
GROUP BY answers.id 
ORDER BY helpful DESC;

Thanks!

2 Answers 2

1

I think you can skip the subquery:

SELECT answers.id, question_id, body, date_written, answerer_name, answerer_email, reported, helpful, ARRAY_AGG(photo) as photos 
FROM answers 
LEFT JOIN photos ON answers.id = photos.answer_id 
WHERE reported IS false AND question_id = 20012 
GROUP BY answers.id, question_id, body, date_written, answerer_name, answerer_email, reported, helpful
ORDER BY helpful DESC;

You can add a btree index on photos.answer_id because this field is use in the join clause.

You losed same fields on the GROUP BY clause;

Sign up to request clarification or add additional context in comments.

1 Comment

I will try to change the WHERE order WHERE question_id = 20012 AND reported IS false. To limit the amount of rows first, but maybe Postgresql do that automatically.
1

One way that often works, is to aggregate first, then join on the result (rather than aggregating the full result). And you don't really need the IN condition either

SELECT a.id, 
       a.question_id, 
       a.body, 
       a.date_written, 
       a.answerer_name, 
       a.answerer_email, 
       a.reported, 
       a.helpful, 
       p.photos 
FROM answers a
  LEFT JOIN (
    select answer_id, array_agg(photo) as photos 
    from photos
    group by answer_id
  ) p ON a.id = p.answer_id 
WHERE reported IS false 
  AND a.question_id = 20012
ORDER BY a.helpful DESC;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.