Improving performance for sql select query

Question

So I am building a Notification System where there can be million subscribers for a particular topic (some string) . Say million Subscribers want to receive notification for topic "abc".

Now we are storing data for million subscribers inside a mysql database .

So for topic "abc" I want to retreive this list of million subscribers .

I am using Hibernate as my ORM here . So for selecting the list of subscribers, I am doing a select

select * from Subscription AS sub INNER JOIN Topic AS t ON sub.topicId = t.topicId 
INNER JOIN Subscriber AS sr ON sub.subscriberId= sr.subscriberId  
WHERE t.topic = 'abc'

Considering that I have million of Subscribers in my database and this will be scanning over million of rows . I believe the query will take a long time to execute .

I am retreiving the list in my DAO(using Spring JPA) as

List<Subscription>subList = subRepository.findByTopicName(eventBean.getTopic())

I want my resultsets to get populated as fast as possible as the idea is to send notification in real time .

Considering the above premise , will calling database this way and considering million of rows are there will give desired performance ? My guess is no .

How can I optimize my query and how do I retreive it so that I can acheive real time performance ??

I know using indexes will increase performance but does how to retrieve all this data at once . Is it possible with Hibernate to store this much data in some cache ? Will using this caching be efficient ??

Also , will pagination help here ???

I am not looking for exact solutions here but just idea from people who have solved this kind of problem before .

I would suggest you scale your application gradually. Building and designing something now to handle millions of subscribers may be premature. Make sure you can handle thousands of users first. Then if you get there, make adjustments to handle hundreds of thousands, then millions. I think by the time you get to millions you may want to consider a PubSub system instead of SQL. For now you're ok with the query as is -- just make sure your ids and foreign keys are indexed. — Michael Y.
– Michael Y., Commented Oct 11, 2015 at 16:09

Gordon Linoff · Accepted Answer · 2015-10-11 15:52:43Z

1

For this query:

select *
from Subscription AS sub INNER JOIN
     Topic AS t
     ON sub.topicId = t.topicId INNER JOIN
     Subscriber AS sr
     ON sub.subscriberId = sr.subscriberId  
WHERE t.topic = 'abc' ;

You want the following indexes: Topic(topic, topicId), Subscription(topicId, sub, subsriberId), and Subscriber(subsriberId).

The performance of the query is then going to be based on the volume of data being returned. Returning a million rows is a lot of rows, so that will be an important performance consideration.

answered Oct 11, 2015 at 15:52

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

AlVaz Over a year ago

Another relatively easy optimization is choosing the driving table based on total number of rows. If Subscription is more rows than topics, if you SELECT * FROM Topic and then do your joins, you will ultimately scan fewer rows in query execution.

Gordon Linoff Over a year ago

@AlVaz . . . Depending on the selectivity of topic, that could be a better approach under certain circumstances.

Ankur Garg Over a year ago

yes Alvaz , topics will be always less than subscriptions . Thanks for suggesting this ...but still 1 topic can be matched to lakhs to millions subscribers . What are other ways to improve optimization . Most importantly how will mysql transfer this million row data and how will hibernate store it ...I think at these places too , performance will be impacted ..

Collectives™ on Stack Overflow

Improving performance for sql select query

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related