2

I am trying to understand why Postgresql doesn't use index when one of the condition is in subquery.

I have index on (ticker, date)

EXPLAIN ANALYZE SELECT OPEN 
FROM
    daily_prices dp 
WHERE
    dp.ticker = ( SELECT ticker FROM tickers_profile WHERE short_name = '1TECH' ) 
    AND DATE <= '2022-07-05'
Gather  (cost=1008.30..116680.66 rows=1065 width=5) (actual time=191.767..731.514 rows=285 loops=1)
  Workers Planned: 2
  Params Evaluated: $0
  Workers Launched: 2
  InitPlan 1 (returns $0)
    ->  Index Scan using tickers_profile_short_name_idx on tickers_profile  (cost=0.28..8.30 rows=1 width=6) (actual time=0.023..0.024 rows=1 loops=1)
          Index Cond: (short_name = '1TECH'::citext)
  ->  Parallel Seq Scan on daily_prices dp  (cost=0.00..115565.86 rows=444 width=5) (actual time=364.367..719.817 rows=95 loops=3)
        Filter: ((date <= '2022-07-05'::date) AND ((ticker)::text = $0))
        Rows Removed by Filter: 674481
Planning Time: 0.134 ms
Execution Time: 731.557 ms

If subquery is inlined (as a param)

EXPLAIN ANALYZE SELECT OPEN 
FROM
    daily_prices dp 
WHERE
    dp.ticker = '03041'
    AND DATE <= '2022-07-05'
Bitmap Heap Scan on daily_prices dp  (cost=30.42..3646.10 rows=975 width=5) (actual time=0.506..0.758 rows=285 loops=1)
  Recheck Cond: (((ticker)::text = '03041'::text) AND (date <= '2022-07-05'::date))
  Heap Blocks: exact=16
  ->  Bitmap Index Scan on daily_prices_pkey  (cost=0.00..30.18 rows=975 width=0) (actual time=0.490..0.490 rows=285 loops=1)
        Index Cond: (((ticker)::text = '03041'::text) AND (date <= '2022-07-05'::date))
Planning Time: 0.077 ms
Execution Time: 0.793 ms

I have tried:

  1. CTE (with statements), same performance
  2. Join statement, same performance

EDIT: Using a functions work

CREATE 
    OR REPLACE FUNCTION get_ticker_from_short_name ( _short_name VARCHAR ) RETURNS TEXT AS $BODY$ SELECT
    ticker 
FROM
    tickers_profile 
WHERE
    short_name = _short_name$BODY$ LANGUAGE SQL VOLATILE COST 100
EXPLAIN ANALYZE SELECT OPEN 
FROM
    daily_prices dp 
WHERE
    dp.ticker = ( SELECT get_ticker_from_short_name ( '1TECH' ) ) 
    AND DATE <= '2022-07-05'
Bitmap Heap Scan on daily_prices dp  (cost=36.61..3969.86 rows=1065 width=5) (actual time=0.369..0.565 rows=285 loops=1)
  Recheck Cond: (((ticker)::text = $0) AND (date <= '2022-07-05'::date))
  Heap Blocks: exact=16
  InitPlan 1 (returns $0)
    ->  ProjectSet  (cost=0.00..5.27 rows=1000 width=32) (actual time=0.314..0.314 rows=1 loops=1)
          ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.001 rows=1 loops=1)
  ->  Bitmap Index Scan on daily_prices_pkey  (cost=0.00..31.08 rows=1065 width=0) (actual time=0.362..0.362 rows=285 loops=1)
        Index Cond: (((ticker)::text = $0) AND (date <= '2022-07-05'::date))
Planning Time: 0.088 ms
Execution Time: 0.604 ms
2
  • Your queries should be identical, yet they return a different numbers of rows. I can't explain what is happening if I don't trust your description of what it is that needs an explanation. Commented Jul 5, 2022 at 19:44
  • @jjanes sorry for the typo, edited the question, the rows are same now. Commented Jul 5, 2022 at 23:46

3 Answers 3

2

PostgreSQL could use an index there, but it prefers not to, probably because there are some values for daily_prices.ticker that occur very frequently and would make an index scan a bad choice. The optimizer doesn't know which value the subquery will return. Perhaps split the job into two queries, then the optimizer knows when it plans the second query.

If you know that none of the frequent values can be returned, you could add a WHERE condition like AND dp.ticker NOT IN ('frequent value 1', 'frequent value 2').

Sign up to request clarification or add additional context in comments.

3 Comments

That was my first thought, but if that were the case the row estimates for the generic and the specific queries should be far apart, but they are not.
EDIT: Replacing the subquery with function works. Is the function evaluated before planning time?
Right, that's mysterious.
1

Using a function works for my case

CREATE 
    OR REPLACE FUNCTION get_ticker_from_short_name ( _short_name VARCHAR ) RETURNS TEXT AS $BODY$ SELECT
    ticker 
FROM
    tickers_profile 
WHERE
    short_name = _short_name$BODY$ LANGUAGE SQL VOLATILE COST 100
EXPLAIN ANALYZE SELECT OPEN 
FROM
    daily_prices dp 
WHERE
    dp.ticker = ( SELECT get_ticker_from_short_name ( '1TECH' ) ) 
    AND DATE <= '2022-07-05'
Bitmap Heap Scan on daily_prices dp  (cost=36.61..3969.86 rows=1065 width=5) (actual time=0.369..0.565 rows=285 loops=1)
  Recheck Cond: (((ticker)::text = $0) AND (date <= '2022-07-05'::date))
  Heap Blocks: exact=16
  InitPlan 1 (returns $0)
    ->  ProjectSet  (cost=0.00..5.27 rows=1000 width=32) (actual time=0.314..0.314 rows=1 loops=1)
          ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.001 rows=1 loops=1)
  ->  Bitmap Index Scan on daily_prices_pkey  (cost=0.00..31.08 rows=1065 width=0) (actual time=0.362..0.362 rows=285 loops=1)
        Index Cond: (((ticker)::text = $0) AND (date <= '2022-07-05'::date))
Planning Time: 0.088 ms
Execution Time: 0.604 ms

Comments

0

I had a similar issue.

For me changing from WHERE dp.ticker = to WHERE dp.ticker IN fixed it.

Fixed query:

EXPLAIN ANALYZE SELECT OPEN 
FROM
    daily_prices dp 
WHERE
    dp.ticker IN ( SELECT ticker FROM tickers_profile WHERE short_name = '1TECH' ) 
    AND DATE <= '2022-07-05'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.