0

I have following tables with Table1 with rows: >25million rows

Table1:

chrom strand ref_base alt_base pos      gene_ensembl_identifier seq_window_9mers  mutated_base seq_window_mut_9mers 
----- ------ -------- -------- -------- ----------------------- ----------------- ------------ -------------------- 
3     1      C        T        40457498 ENSG00000168032         ACGCTCTACACACACAG A            ACGCTCTAAACACACAG    

Table2

seq_window_mut_9mers start substring 
-------------------- ----- --------- 
ACGCTCTAAACACACAG    1     ACGCTCTAA
ACGCTCTAAACACACAG    2     CGCTCTAAA
ACGCTCTAAACACACAG    3     GCTCTAAAC
ACGCTCTAAACACACAG    4     CTCTAAACA
ACGCTCTAAACACACAG    5     TCTAAACAC
ACGCTCTAAACACACAG    6     CTAAACACA
ACGCTCTAAACACACAG    7     TAAACACAC
ACGCTCTAAACACACAG    8     AAACACACA
ACGCTCTAAACACACAG    9     AACACACAG


I would like to perform a join to have the following table on column seq_window_mut_9mers.

final_table

chrom strand ref_base alt_base pos      gene_ensembl_identifier   seq_window_mut_9mers  substring
----- ------ -------- -------- -------- ----------------------- ----------------- ------------ -------------------- 
3     1      C        T        40457498 ENSG00000168032           ACGCTCTAAACACACAG     ACGCTCTAA
3     1      C        T        40457498 ENSG00000168032           ACGCTCTAAACACACAG     CGCTCTAAA
3     1      C        T        40457498 ENSG00000168032           ACGCTCTAAACACACAG     GCTCTAAAC
3     1      C        T        40457498 ENSG00000168032           ACGCTCTAAACACACAG     CTCTAAACA
3     1      C        T        40457498 ENSG00000168032           ACGCTCTAAACACACAG     TCTAAACAC
3     1      C        T        40457498 ENSG00000168032           ACGCTCTAAACACACAG     CTAAACACA
3     1      C        T        40457498 ENSG00000168032           ACGCTCTAAACACACAG     TAAACACAC
3     1      C        T        40457498 ENSG00000168032           ACGCTCTAAACACACAG     AAACACACA
3     1      C        T        40457498 ENSG00000168032           ACGCTCTAAACACACAG     AACACACAG

I am running following postgres query through dbvisualizer. At the moment, the query is running very slow (still waiting for the output.. >10 mins).

SELECT 
chrom, strand, ref_base, alt_base, pos, gene_ensembl_identifier, mut.seq_window_mut_9mers substring 
FROM table1    
LEFT JOIN table2 ON mer9.seq_window_mut_9mers = table1.seq_window_mut_9mers;

How can I make it run faster? Any suggestions will be really helful.

Thanks

2
  • 1
    Please edit your question and add the execution plan generated using explain (analyze, buffers, format text) (not just a "simple" explain) as formatted text and make sure you prevent the indention of the plan. Paste the text, then put ``` on the line before the plan and on a line after the plan. Please also include complete create index statements for all indexes as well. Commented Feb 10, 2020 at 10:57
  • And where is the query? Also provide the schema's DDL, i.e. the CREATE TABLE statements of the tables and the CREATE INDEX statements of their indexes and the explain plan. Commented Feb 10, 2020 at 10:57

1 Answer 1

3

It looks like you don't really need to join with table2. You can generate it on the fly with substring function, like this:

select
  table1.*,
  offsets.start,
  substring(seq_window_mut_9mers from offsets.start for 9) as substring
from
  table1,
  (select generate_series(1,9) as start) as offsets;

It will be much faster that a join.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.