0

I have recently seen a system design for bidding service and one of its use cases was real-time delivery of bid updates for a specific item (or a set of items) to the client.

The design was quite abstract and did not elaborate on specific details of individual components. Hence I am trying to figure out how it could be implemented technically.

The design suggests the following:

  • bids are sent asynchronously from backend service called bids service to event queue (the design does not elaborate which exactly queue service it is, it only mentions that this is an event queue)
  • there is a backend service called real-time service which consumes this event queue and sends real-time updates to clients through websocket connection

Below is my rough understanding of how it could be implemented:

Let's assume that "real-time service" has 2 instances (instance1, instance2).

Suppose that user1 is connected to instance1 and user2 is connected to instance2 through websocket. Both subscribe to item5.

  • if AWS SQS is used as a queue, then all instances will listen to it, but every event will be consumed and ACKed by only one of those instances and then removed from the queue, which means that the other instances won't receive it and hence won't be able to send status updates to their connected users.
    Example: When event for item5 becomes available in the queue, it is consumed by instance2 and sent to user2. But user1 will not get that event because instance1 did not consume it.

  • if AWS Kinesis is used as a queue, then the above problem disappears because Kinesis persists the data for a specific retention window. So both instances will consume the event and send it to user1 and user2. But the negative side here is that every instance will be processing every incoming Kinesis event, regardless of whether connected users actually need it or not.
    Example: Assume there are lots of users connected to the instance and all are subscribed to item5. But there are only events for item3 in the queue. Nevertheless, for each consumed event, the service will need to check client subscriptions to see if any of them needs item3 and then just discard it because no one needs it.

  • if pubsub is used like e.g. Kafka or RabbitMQ (with topic name identical to itemID) then things get way better compared to Kinesis because instance can subscribe to specific topics only, ignoring everything else, and when an event gets consumed from one of topics, the service checks which clients have been subscribed to the event, and sends this event to all those clients.

In the reasoning above, I also assume that every service instance maintains an in-memory mapping of itemId-to-<list of clientIds> subscriptions, in order to be able to quickly find a list of clients waiting for update from a specific item.
For example, if client1 is connected and waits for item1 and item2, and client2 waits for item2 and item3 on the same instance of real-time service, then the map will look like:

item1 -> {client1}  
item2 -> {client1, client2}  
item3 -> {client2}

Does all my reasoning make sense here? Is this how real-time streaming actually implemented?
Note that this is applicable not just for bidding system but also for instant messaging, where the clients could subscribe to other clients or groups (and real-time service is actually a chat service)

PS. I never used a queue/pubsub subscriber as part of a web backend application. Is that a good practice to do so?

2
  • If you are looking for 'opinions', you might get a better response at: reddit.com/r/aws Commented Nov 23, 2024 at 21:19
  • @JohnRotenstein opened. would appreciate any feedback (either here or there): reddit.com/r/aws/comments/1gybus1/… Commented Nov 24, 2024 at 11:08

1 Answer 1

1

I feel like you have a good general idea of the main messaging services that AWS provides, but I think a couple details are missing:

SQS

For this use case it might be a good option to consider an SQS FIFO queue which means the messages will be handled in order.

Otherwise one bid might reach the backend before another, even though it was published at a later timestamp.

Furthermore, make sure that the visibility timeout is larger than the time you require to process the message. Otherwise you could accidentally pull the same SQS message twice. This is a general best practice though, so nothing particular with your use case.

Kinesis

Kinesis will not force each consumer to read all messages.

With kinesis you have a dynamodb lease table where each consumer registers which shards it will be pulling information from.

In this setup with 2 instances it could be a good idea to have 2 shards (or more if needed) and each instance will only pull from the shards it has resisted in the lease table.

Furthermore, it would probably be a good idea to use the bidding item id as a hash key for the kinesis messages, assuring that each bid for a certain item always goes to the same shard, since kinesis only promises ordering on a shard level.

Backend Application

Depending on how your application runs it might still be a good idea to have some sort of locking in place in the application itself.

If you are sure it will only ever pull and process one message at a time though, I guess this might be overly cautious.

As to the "I also assume that every service instance maintains an in-memory mapping..." part:

I'm not quite sure if you're talking about the "leasing" part or the data storage of the bids. For both cases though I would ask why you think it should be in memory?

A centralised location both for the "leasing" and storage of the bids seems more appropriate, since it's then decoupled from the instances.

With kinesis there is the dynamodb lease table, and with Kafka I guesss it's something different, but still something central that all instances communicate with. Otherwise there would be no way of detection collisions where multiple consumers try and process the same shards etc.

As for the storage of the bids themselves, that should also generally be in a separate storage (e.g. RDS / DynamoDB) since otherwise they would of course disappear if you had an issue with your application or performed a deployment etc.

Edit: I believe you would need the KCL and KPL libraries for the lease table functionality with Kinesis I mentioned above: https://docs.aws.amazon.com/streams/latest/dev/kcl.html

And here's a link regarding the lease table: https://docs.aws.amazon.com/streams/latest/dev/kcl-dynamoDB.html

Sign up to request clarification or add additional context in comments.

3 Comments

Regarding Kinesis: so you mean that the client subscribing, say, to "item5", would need to connect to a specific instance of "real-time service" which is going to consume from kinesis shard having "item5" records?
Regarding this part "I'm not quite sure if you're talking about the "leasing" part or the data storage of the bids": it's for temporarily storing clientId-itemId associations in instance memory, so that when the backend instance consumes a record from queue, it can quickly check which connected clients are waiting for that record. This mapping is updated whenever a client connects/disconnects
Regarding SQS: but how to ensure that the record is consumed by ALL instances of real-time service? The record will be consumed (and removed from queue) by one consumer instance only. It should be consumed by all instances. Well unless we use the same strategy as for Kinesis, where every instance polls a specific message group ID but it's impossible to consume a specific message group. Every instance could have its own SQS queue but this is too complicated given that the instances scale in and out

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.