0

I have 3 tables. Stocks have many News articles and News articles can refer to 1 or more Stocks. This is modelled with a Stock table, a News table and a Stock_News table.

How would I get the latest News article for say 30 stock symbols I provide? What indices would make this most efficient?

My News table has id, link, published_at. (index on published_at, id primary key)

My Stocks table has id, symbol. (index on symbol, id primary key)

My stock_news table has stock_id news_id. (index on each individually and combined)

Currently I am using but i as wondering if this is the best way

SELECT n.link, s.symbol, n.published_at FROM news n 
JOIN stock_news sn on n.id = sn.news_id 
JOIN stocks s on s.id = sn.stock_id where s.symbol in ('AAPL', 'GOOG' ... etc) 
ORDER BY n.published_at DESC;

The EXPLAIN query on some demo data shows:

 Sort  (cost=8.92..8.92 rows=1 width=115)
   Sort Key: n.published_at DESC
   ->  Nested Loop  (cost=3.50..8.92 rows=1 width=115)
         ->  Hash Join  (cost=3.45..7.51 rows=1 width=12)
               Hash Cond: (s.id = sn.stock_id)
               ->  Seq Scan on stocks s  (cost=0.00..4.05 rows=2 width=12)
                     Filter: ((symbol)::text = ANY ('{AAPL,GOOG}'::text[]))
               ->  Hash  (cost=2.67..2.67 rows=223 width=16)
                     ->  Seq Scan on stock_news sn  (cost=0.00..2.67 rows=223 width=16)
         ->  Index Scan using news_pkey on news n  (cost=0.05..1.40 rows=1 width=119)
               Index Cond: (id = sn.news_id)
1
  • looks good, but you might want to check out the code review forum Commented Mar 15, 2018 at 4:01

1 Answer 1

1

If you want the latest, I would recommend distinct on:

SELECT DISTINCT ON (s.symbol) n.link, s.symbol, n.published_at
FROM news n JOIN
     stock_news sn
     ON n.id = sn.news_id JOIN
     stocks s
     ON s.id = sn.stock_id 
WHERE s.symbol IN ('AAPL', 'GOOG' ... etc) 
ORDER BY s.symbol, n.published_at DESC;

For performance you want indexes on: stocks(symbol, id), stock_news(stock_id, new_id), and news(id).

Sign up to request clarification or add additional context in comments.

3 Comments

Yep, +1, except that SQL looks funny to me :)
why not an index on news(published_at)? Also, when you say stocks(symbol,id) is that a multicolumn index or 2 separate indices?
@TerenceChow . . . published_at is the second key in the order by. An index is useful for that.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.