How to make a postgres query run faster?

Question

I have following tables with Table1 with rows: >25million rows

Table1:

chrom strand ref_base alt_base pos      gene_ensembl_identifier seq_window_9mers  mutated_base seq_window_mut_9mers 
----- ------ -------- -------- -------- ----------------------- ----------------- ------------ -------------------- 
3     1      C        T        40457498 ENSG00000168032         ACGCTCTACACACACAG A            ACGCTCTAAACACACAG

Table2

seq_window_mut_9mers start substring 
-------------------- ----- --------- 
ACGCTCTAAACACACAG    1     ACGCTCTAA
ACGCTCTAAACACACAG    2     CGCTCTAAA
ACGCTCTAAACACACAG    3     GCTCTAAAC
ACGCTCTAAACACACAG    4     CTCTAAACA
ACGCTCTAAACACACAG    5     TCTAAACAC
ACGCTCTAAACACACAG    6     CTAAACACA
ACGCTCTAAACACACAG    7     TAAACACAC
ACGCTCTAAACACACAG    8     AAACACACA
ACGCTCTAAACACACAG    9     AACACACAG

I would like to perform a join to have the following table on column seq_window_mut_9mers.

final_table

chrom strand ref_base alt_base pos      gene_ensembl_identifier   seq_window_mut_9mers  substring
----- ------ -------- -------- -------- ----------------------- ----------------- ------------ -------------------- 
3     1      C        T        40457498 ENSG00000168032           ACGCTCTAAACACACAG     ACGCTCTAA
3     1      C        T        40457498 ENSG00000168032           ACGCTCTAAACACACAG     CGCTCTAAA
3     1      C        T        40457498 ENSG00000168032           ACGCTCTAAACACACAG     GCTCTAAAC
3     1      C        T        40457498 ENSG00000168032           ACGCTCTAAACACACAG     CTCTAAACA
3     1      C        T        40457498 ENSG00000168032           ACGCTCTAAACACACAG     TCTAAACAC
3     1      C        T        40457498 ENSG00000168032           ACGCTCTAAACACACAG     CTAAACACA
3     1      C        T        40457498 ENSG00000168032           ACGCTCTAAACACACAG     TAAACACAC
3     1      C        T        40457498 ENSG00000168032           ACGCTCTAAACACACAG     AAACACACA
3     1      C        T        40457498 ENSG00000168032           ACGCTCTAAACACACAG     AACACACAG

I am running following postgres query through dbvisualizer. At the moment, the query is running very slow (still waiting for the output.. >10 mins).

SELECT 
chrom, strand, ref_base, alt_base, pos, gene_ensembl_identifier, mut.seq_window_mut_9mers substring 
FROM table1    
LEFT JOIN table2 ON mer9.seq_window_mut_9mers = table1.seq_window_mut_9mers;

How can I make it run faster? Any suggestions will be really helful.

Thanks

Please edit your question and add the execution plan generated using explain (analyze, buffers, format text) (not just a "simple" explain) as formatted text and make sure you prevent the indention of the plan. Paste the text, then put ``` on the line before the plan and on a line after the plan. Please also include complete create index statements for all indexes as well. — user330315
– user330315, Commented Feb 10, 2020 at 10:57
And where is the query? Also provide the schema's DDL, i.e. the CREATE TABLE statements of the tables and the CREATE INDEX statements of their indexes and the explain plan. — sticky bit
– sticky bit, Commented Feb 10, 2020 at 10:57

Tometzky · Accepted Answer · 2020-02-10 11:30:47Z

3

It looks like you don't really need to join with table2. You can generate it on the fly with substring function, like this:

select
  table1.*,
  offsets.start,
  substring(seq_window_mut_9mers from offsets.start for 9) as substring
from
  table1,
  (select generate_series(1,9) as start) as offsets;

It will be much faster that a join.

answered Feb 10, 2020 at 11:30

Tometzky

24.2k5 gold badges64 silver badges79 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to make a postgres query run faster?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related