PostgreSql LeftJoin Query Performance

Question

I have 3 tables. "Portfolios", "Balances" and "Positions". My users can save multiple portfolios, for each portfolio multiple balances and for each balance multiple positions.

Now I want to query for one portfolio with its balances and for each balances two sums (amount and fees).

My system will automatically filter for the current user.

I have following query:

CREATE OR REPLACE FUNCTION public.getBalancesWithStats(portfolioid bigint)
RETURNS TABLE(
  id bigint,
  name text,
  portfolio_id bigint,
  overall_amount double precision,
  overall_fees double precision) AS $$
    BEGIN
        RETURN QUERY
            SELECT balances.id
    ,balances.name
    ,balances.portfolio_id
    ,SUM(positions.amount) AS overall_amount
    ,SUM(positions.fee) AS overall_fees
FROM balances
LEFT JOIN positions ON positions.crypto_balance_id = balances.id
WHERE balances.portfolio_id = portfolioid
GROUP BY balances.id
    ,balances.name
    ,balances.portfolio_id;
    END;
$$ LANGUAGE plpgsql;

I'm not sure if this would be a performance issue. A user has like 3 portfolios, 10-100 balances and 10 - 100k transactions, this is for sure the largest table.

The result should look like

 id, name, overall_amount, overall_fees

Distinct/Grouped by id

Update

After I created two indexes:

create index balance_portfolio_id on balances(portfolio_id, id)
create index positions_balance_id on positions(crypto_balance_id, id)

and filled one portfolio with 1k balances and a balance with 1k positions and query the other portfolio with one balance and one position the balance switched to indexed scan but positions still seq scan. Before I created an index for positions there was a nested loop as well.

After I put another 10k balances and positions in one portfolio/balance it switched both to bitmap heap scan and Bitmap Index Scan.

I'm not sure what this means but it should be both index scan in the first step and the second bitmap does not say anything to me.

Can I improve my query even more? Thanks a lot guys!

1. If the query is reading 100k rows it's bound to take some time; nowadays it should be in the milliseconds to seconds range. What's your expectation in terms of response time? — The Impaler
– The Impaler, Commented Jan 29, 2022 at 19:14
2. Please include the execution plan. The query may be missing some indexes. — The Impaler
– The Impaler, Commented Jan 29, 2022 at 19:15
Milliseconds would be fine haha. Its an app, even one or multiple seconds would be fine. It does not matter how many users I have right or how big the positions table would grow? What is a execution plan? I have indexes on all ids. should I put one on the sum fields? — Sesa
– Sesa, Commented Jan 29, 2022 at 19:26
It always matters if the table grows. I suggest you test it with tons of data. — AminM
– AminM, Commented Jan 29, 2022 at 19:44
To get the execution plan in PostgreSQL run EXPLAIN <query>. Paste the result in the question. — The Impaler
– The Impaler, Commented Jan 29, 2022 at 19:45

rajorshi · Accepted Answer · 2022-01-30 06:48:31Z

0

Observations from execution plan:

Seq Scan (FULL table ) on BALANCE while filtering rows for portfolio_id (passed as parameter) is selecting 2 rows and not using index - INDEX IS MISSING or NOT USED on balances.portfolio_id. If the index does not exist , create the index and if the index exist verify the health of index (bloated etc...). Rebuild the index ...
The join between POSITION and BALANCE should be nested loop using index - its showing HASH join - INDEX IS MISSING or NOT USED on positions.crypto_balance_id (POSITION is the inner table in the join).
For the 'group by operation' monitor 'temp_blks_read' & 'temp_blks_written' for the queryid in PG_STAT_STATEMENTS while testing and see if this IS high, if so , you will need to increase the work_mem proportionately for the session.
You can enable parallel query for the session . If you do, keep a watch on WORK_MEM because each parallel slave will grab memory equal to WORK_MEM for itself impacting the memory availability for other sessions.

edited Jan 30, 2022 at 6:48

answered Jan 30, 2022 at 2:49

rajorshi

7174 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Sesa Over a year ago

Isn't a index automatically created when I create a reference foreign key? I think it will use Seq Scan when I query almost the whole table because its faster. I will create some test data and test it. I don't understand the second point. Do I need to change something in my query or what does a nested loop using index mean?

rajorshi Over a year ago

See this post: stackoverflow.com/questions/60211067/…

rajorshi Over a year ago

Defining a FOREIGN KEY on a column does not create an index on the column. Defining a PRIMARY KEY on a column does. Your explain plan says "Seq Scan on balances .... ( rows =2 ) " ... which means after a sequential scan , the operation node is returning only 2 rows . Yes it does full table scan but it does not need to for retrieving 2 rows. Second point is when you have less rows to join (here result of the join node is 3 rows) , A row need to be picked from outer table and a matching rows from inner table needs picked using an INDEX LOOKUP which results in a joined row using a nested loop.

Sesa Over a year ago

I created the two indexes create index balance_portfolio_id on balances(portfolio_id, id) create index positions_balance_id on positions(crypto_balance_id, id) and executed the explain, the result was the same. After that I created 1k balances for one portfolio and only one balance for the other portfolio and it switched to Index Scan for balance but positions still seq scan. New picture at post.

Sesa Over a year ago

I added the description to the post. Its easier to read there

Collectives™ on Stack Overflow

PostgreSql LeftJoin Query Performance

1 Answer 1

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related