3

So I am building a Notification System where there can be million subscribers for a particular topic (some string) . Say million Subscribers want to receive notification for topic "abc".

Now we are storing data for million subscribers inside a mysql database .

So for topic "abc" I want to retreive this list of million subscribers .

I am using Hibernate as my ORM here . So for selecting the list of subscribers, I am doing a select

select * from Subscription AS sub INNER JOIN Topic AS t ON sub.topicId = t.topicId 
INNER JOIN Subscriber AS sr ON sub.subscriberId= sr.subscriberId  
WHERE t.topic = 'abc' 

Considering that I have million of Subscribers in my database and this will be scanning over million of rows . I believe the query will take a long time to execute .

I am retreiving the list in my DAO(using Spring JPA) as

List<Subscription>subList = subRepository.findByTopicName(eventBean.getTopic())

I want my resultsets to get populated as fast as possible as the idea is to send notification in real time .

Considering the above premise , will calling database this way and considering million of rows are there will give desired performance ? My guess is no .

How can I optimize my query and how do I retreive it so that I can acheive real time performance ??

I know using indexes will increase performance but does how to retrieve all this data at once . Is it possible with Hibernate to store this much data in some cache ? Will using this caching be efficient ??

Also , will pagination help here ???

I am not looking for exact solutions here but just idea from people who have solved this kind of problem before .

1
  • I would suggest you scale your application gradually. Building and designing something now to handle millions of subscribers may be premature. Make sure you can handle thousands of users first. Then if you get there, make adjustments to handle hundreds of thousands, then millions. I think by the time you get to millions you may want to consider a PubSub system instead of SQL. For now you're ok with the query as is -- just make sure your ids and foreign keys are indexed. Commented Oct 11, 2015 at 16:09

1 Answer 1

1

For this query:

select *
from Subscription AS sub INNER JOIN
     Topic AS t
     ON sub.topicId = t.topicId INNER JOIN
     Subscriber AS sr
     ON sub.subscriberId = sr.subscriberId  
WHERE t.topic = 'abc' ;

You want the following indexes: Topic(topic, topicId), Subscription(topicId, sub, subsriberId), and Subscriber(subsriberId).

The performance of the query is then going to be based on the volume of data being returned. Returning a million rows is a lot of rows, so that will be an important performance consideration.

Sign up to request clarification or add additional context in comments.

3 Comments

Another relatively easy optimization is choosing the driving table based on total number of rows. If Subscription is more rows than topics, if you SELECT * FROM Topic and then do your joins, you will ultimately scan fewer rows in query execution.
@AlVaz . . . Depending on the selectivity of topic, that could be a better approach under certain circumstances.
yes Alvaz , topics will be always less than subscriptions . Thanks for suggesting this ...but still 1 topic can be matched to lakhs to millions subscribers . What are other ways to improve optimization . Most importantly how will mysql transfer this million row data and how will hibernate store it ...I think at these places too , performance will be impacted ..

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.