1

I have a tricky situation where I need a process to be run on a queue of messages in order that they arrive in the queue but the queue is going to be of several hundred queues that need to be dynamically created

Each message relates to a customer so I would have 2 queue per customer

I need a way of having a queue trigger for each queue but there are potentially going to be several thousand queues

Queue triggers appear to require the name to be hardcoded for the queue

Obviously I can’t have a function in my code for each queue name

Has anyone got any idea of how I can do this?

Let’s say we have 3 customers with 5 messages each

This would be 3 queues with 5 messages

I want to be able to process all 3 customers at once not have everything in a single queue because the customers are not related so the messages being queued have nothing to do with each other at the customer level so I don’t want customer 3 messages having to wait whilst customer 1 And 2 are being processed

Does anyone know how to do this?

Paul

12
  • I think you have an architecture problem. From Microsoft "A maximum of 10,000 queues can be created within a single Service Bus". So your approach for "several thousand queues" is not the desired solution. Commented Sep 2, 2023 at 11:08
  • I have a way of getting around that one. I’m more looking for a solution for the concept Commented Sep 2, 2023 at 11:10
  • 1
    This cannot be done with a queue trigger on Azure Functions. The trigger is bound to one specific queue when created and cannot be changed. Your best bet would be to create a hosted service that uses the Service Bus client library directly. Commented Sep 2, 2023 at 13:28
  • Is there a way to be notified what customer messages will be coming? Are you planning to have a queue per customer? When messages are created per customer X, how do you guarantee thale queue for X is there and is not being deleted? Commented Sep 2, 2023 at 19:17
  • @JesseSquire what would that hosted service do? I want to be able to have multiple customers being processed at once but for each customer’s messages to be processed one by one Commented Sep 2, 2023 at 22:06

3 Answers 3

1
+50

The conceptual solution to this problem is to establish a fixed set of queues and to manage which queues the messages go into to ensure optimal performance. We call this Partitioning and instead of rolling your own, most of the Azure Service Bus services have support for partitioning built in. Partitioned queues and topics

It is also not necessary to hardcode connections or queue names, you can make these configuration settings. Then instead of directly coding all queues and connections, you could split the deployment over multiple configured instances. While still unmanageable IMO, it moves this from a coding problem to a deployment problem.

  • Let me convince you not to implement dedicated queues per message producer

    When a customer presents with this requirement:

    I want to be able to process all 3 customers at once not have everything in a single queue because the customers are not related so the messages being queued have nothing to do with each other at the customer level so I don’t want customer 3 messages having to wait whilst customer 1 And 2 are being processed

    My first response is "Why?" What is the likelihood that multiple customers will make requests at the same time, what will be the frequency of the requests? How long does it take to process a single message and how long is it acceptable for the customer to wait for a response?

    When we are dealing with scale of say 10000 message generators, it is expensive to think linearly. Even though 1 queue per customer allows potential for parallel processing, the queues and the management of them is growing linearly. It helps to think differently, about batching and partitioning and sharding to optimise utilisation of a smaller set of resources in a manageable way.

    The reality of how code is executed is that there is always some impact from each message on the others, even if you spawn dedicated threads, there are limits to the number of threads your process can consume. Especially with hyper-threading and virtualization there is always some level of impact from each message on the processing of others, unless you deploy to isolated servers. Attempting to service this number of customers without each impacting the other using individual queues is going to require processing across multiple instances which means you will always need some level of orchestration to the runtime.

The problem with a high number or infinite queues is that you need to manage them. To ensure sequential consistency you would need to spawn a separate thread to monitor each queue and receive messages. If each queue is likely to recieve messages in high frequency, this might be justified, but as soon as you introduce locks to ensure that a receiving thread was the only one processing messages for a given sender, then there is no reason that any idle thread couldn't process any message.

Consider Industrial IoT implementations that process streams of messages from tens of thousands of devices, with throughput measured in thousands of messages per minute... According to your model, I needed thousands of threads or methods configured to recieve messages. I would probably want to spread the load over multiple servers, but how would I know which server had registered listeners for each sender. Then we consider availabilty of this solution, what if one of the servers goes down, how will I detect that, how will I know which senders to subscribe to again.

The power of the Azure Platform comes from designing solutions so that they can be deployed in a way where the platform can manage your allocation of resources for you and move the workloads from busy resources to idle resources. Your requested solution becomes a lot harder to scale out, and your scale metric is now the number of message senders (customers) and is not related to the throughput at all, which becomes wasteful and expensive.

Partioning is a trade off, your idea describes the optimal theoretical throughput but requires infinite access to resources. Partitioning allows us to optimise throughput, by utilising a much lower and potentially constant amount of resources. All while still maintaining sequential consistency.

Each customer would be mapped to a partition, this is determined by the partition key. We try not to constrain specific message generators to specific partitions, the platform will do this for us and will ensure that messages with the same partition key will always be sent to the same internal partitioned queue.

Note: This is an over-simplification. In many configurations, if a partition goes offline, messages can be re-routed to another partition, the overall consistency will be restarted and around this event some newer messages might be processed before older ones on the previous partition.

We can still process each partition in isolation, but this means that if 2 customers are assigned to the same partition, that the processing of the first message will delay processing of the second one. This is the trade-off. Processing 1 message does not hold up all other customers until that message is processed, only those customers in the same partition.

Partitioned entities limitations
Currently Service Bus imposes the following limitations on partitioned queues and topics:

  • Service Bus currently allows up to 100 partitioned queues or topics per namespace for the Basic and Standard SKU.
  • Each partitioned queue or topic counts towards the quota of 10,000 entities per namespace.

Partitioning helps us manage the problem of splitting the processing over multiple servers or processing instances. Using the SDK, each instance, when idle will poll the partitions and attempt to acquire a lock, establishing a singleton connection to a specific partition. This prevents other processes from accessing this partition. When that instance goes idle or the lease on the partition ends the connection is dropped. The next idle instances will pick it up.

There is a bit going on under the hood, but this means that we can effectively scale out to a number of instances and the load will be routed for us without having to write too much code.

For IoT level throughput consider using EventHub and EventProcessorHost

If you really need more than 100 partitions, we simply create another level of partitioning by splitting customers between multiple partitioned queues. 10 named queues could service 1000 partitions. If you really needed to service 10000 customers and ensure that no customer is waiting for another customer's message processing then you would need 100 named queues. This at least is more manageable than 10000 queues.

In most implementations, although theoretically undesirable, the practical implication of 1 customer waiting 200 milliseconds for another message to process first is going to be negligible. 3 seconds might even be acceptable, as long as we maintain consistency of the sequence. Depending on how long it takes to process each message and the frequency of the messages being generated you might find it acceptable for dozens of message generators to be assigned to the same partition (remember each message generator-customer has their own unique partition key, the platform will assign to the available partitions using logic based on the key)

When dealing with paying customers, you might consider that some customers could pay a premium for dedicated resources or for access to environments with a lower number of concurrent customers. For this you might even go so far as to deploying a separate service bus and limit the number of customers who send messages to it, or you might just create a separate queue, again limiting the number of customers who send messages to that queue.

Remember that the connection string, queue name and partition can be allocated through configuration, so your code could be a single method implementation and you could deploy it to multiple app services, each with a different configuration, or you might use deployment slots, against with each slot assigned different configurations.

If you really think that you require the scale, you might use a combination of these techniques or you might automate the deployment so that new queues are created and the same code deployed to dedicated instances, with the only difference being the configuration.

Really stop to analyse the actual requirement and acceptable tolerances in terms of delays to message delivery and how you want to handle failure. Partitioning is a pragmatic solution that makes it possible to achieve near real-time processing for reasonable runtime cost and development effort.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for this very detailed response I am dealing with thousands of payments that must be processed on at a time. One customers payments should not delay another customers hence the need to separate the customers somehow partitioning seems to have a potential
In many cases, for very high burst transaction rates, it is usually better to change the experience to embrace queues and waiting before the transaction is made, this is another hybrid style that many event ticketing agencies implement quite successfully. "One customer anything not delaying another..." is a very narrow view both of consumers and technical constraints of technology available. There will always be a physical maximum number of customers before you need to change your approach. A partitioned solution should be significantly cheaper to develop and operate.
0

I agree with @Sean and @Jesse Squire When a trigger is created, it is fixed to a single queue and cannot be altered. Making a hosted service that directly utilises the Service Bus client library would be your best option. If you want to use numerous queues you can use ServiceBusClient to generate a ServiceBusReceiver. Send a certain number of messages, then shut off the receiver and move on to the next. You may choose to accomplish that simultaneously or assign different services to different portions of it.

Below is the C# code illustrating how to use the ServiceBusClient to spawn a ServiceBusReceiver to handle messages from multiple queues simultaneously:-

using Azure.Messaging.ServiceBus;
using System;
using System.Threading.Tasks;
using System.Threading;

class Program
{
    const string ServiceBusConnectionString = "Endpoint=sb://siliconsb45.servicebus.windows.net/;SharedAccessKeyName=RootManageSharedAccessKey;SharedAccessKey=xxxxxMubQxxxx";
    const string QueueName1 = "customer";
    const string QueueName2 = "customer1";

    static async Task Main(string[] args)
    {
        await ProcessQueuesConcurrently();
    }

    static async Task ProcessQueuesConcurrently()
    {
        ServiceBusClient client = new ServiceBusClient(ServiceBusConnectionString);

        var receiver1 = client.CreateReceiver(QueueName1);
        var receiver2 = client.CreateReceiver(QueueName2);

        // Start receiving messages from Queue 1
        var processingTask1 = ProcessMessagesAsync(receiver1);

        // Start receiving messages from Queue 2
        var processingTask2 = ProcessMessagesAsync(receiver2);

        // You can add more queues and tasks as needed...

        // Simulate processing for a certain duration (e.g., 30 seconds)
        await Task.Delay(TimeSpan.FromSeconds(30));

        // Set a flag to stop processing
        bool stopProcessing = true;

        // Wait for the processing tasks to complete
        await Task.WhenAll(processingTask1, processingTask2);

        // Close the receivers and client
        await receiver1.CloseAsync();
        await receiver2.CloseAsync();
        await client.DisposeAsync();
    }

    static async Task ProcessMessagesAsync(ServiceBusReceiver receiver)
    {
        try
        {
            while (true)
            {
                ServiceBusReceivedMessage message = await receiver.ReceiveMessageAsync();

                if (message != null)
                {
                    Console.WriteLine($"Received message: {message.Body}");
                    // Process the message here

                    // Complete the message to remove it from the queue
                    await receiver.CompleteMessageAsync(message);
                }
                else
                {
                    // No more messages, exit the loop
                    break;
                }
            }
        }
        catch (Exception ex)
        {
            Console.WriteLine($"Error processing messages: {ex.Message}");
        }
    }
}

Output:-

Received messages from both the queues:-

enter image description here

  • You can also create multiple Function instances. To handle messages from various queues simultaneously, each instance might have a different queueName configuration parameter.

  • Use the Azure Service Bus SDK to programmatically create a new queue whenever a new customer is created. Reference - Refer code from other languages in the left.

2 Comments

This is really helpful thanks! How would I be able to use this approach for creating a new instance? Ideally I would like to having the function instances scale automatically provided they don’t process messages out of order
I’m not committed to having to have multiple queues if there is a way to partition and have message listeners listen for a specific session id
0

Queue triggers appear to require the name to be hardcoded for the queue

This is incorrect, you should not be hardcoding the connection but you can also make the queue name configurable. You only hardcode the Key that is used to access the setting:

The reason that there are not a lot of examples out there using an infinite number of queues is an indication that this is not a common solution to this problem. I can however assure you that this problem comes up quite frequently as a product request, it just doesn't often make it into the requirements ;)

I don’t want customer 3 messages having to wait whilst customer 1 And 2 are being processed

As a general concept, this is very hard to ensure, at some point in the transaction chain, there is always going to be a pinch point where one customer will have to wait. But there are ways to minimise the delays.

I talked briefly about this in my first response but there are techniques that are designed for high compute needs per user, and there are different techniques for supporting high volume of low compute needs. Either can work but the costs can quickly blow out if you pick the wrong one.

From a sales and payment point of view, shopping carts and reserving inventory items before the actual transaction is generally all you need. The speed of actual processing of a payment is often governed by the payment merchant and because the allocation of the inventory items being purchased has been pre-approved or allocated before the physical transaction, there is no reason that any customer waiting would negatively affect the others, because their sale is already guaranteed. Scenarios like this usually work really well with a minimal number of queues.

  • The important aspect of this model is that all of the information needed to process the payment is collated into a single payload, which ultimately is a single message on the queue.

This works well for online stores that have fixed inventory with high demand, you might incorporate quantity limits and a timeout on the user session such that one user can't select items and sit on them specifically to prevent other users from having access. One the session is cancelled or times out, the reserved stock becomes available again.

  • The processing of the initial request may involve a number of steps, a microservice implementation might model these steps as separate queues, 1 queue per step, being chained such that the completion of step 1 enqueues the message for step 2.
  • To determine if this is appropriate requires understanding about WHY it is important that Customer 3 is not waiting for Customer 1 and 2 to be processed.

This wouldn't work for peer-to-peer trading scenarios where spot prices are affected by demand or there is a general ledger of trades for sale and bids to buy. This is the type of scenario where you specifically do want each customer to wait while other transactions are processed, so the advice here is not suited to those scenarios.

10000 queues because I have 10000 customers is just ludicrous, your plan is to grow but lets say 1 million customers and queues... is that a reasonable way to utilise the technology we have available?

Ask yourself these questions first:

  • How many concurrent users will be connected?
  • How many of those users will be making transactions?
  • How many transactions are you expecting to be made concurrently?
  • What is the average time between the user logging in and making a transaction?
  • Are there specific or known peak times when concurrent transactions are high and other times where the system might be idling?
  • What is an acceptable wait time between making a transaction and waiting for a reply?
  • How long does it take to process a single transaction?
  • What is the impact on the customer, other than time, if they do have to wait?

The reason to ask these questions is that if the actual number of concurrent transactions is low, and the transaction processing time is low, then is there a business case to bother worrying about potential waits that users might experience?

If the concurrency and transaction rates are high, then you first need to use technology stack that can handle the load. You also need to identify if the maximum throughput of your end to end networking can even handle the expected peak loads.

  • If you already have a solution in place, scaling out sideways through multiple queues to allow an infinite degree of parallel processing is not a good solution, all parallel solution inherit some form of orchestration overhead, there is always a sweet spot in parallel solutions where the cost and effort starts to outweigh the benefits, or where managing the parallel threads starts to slow or negatively impact the entire throughput.

For high compute loads, like game engines, one solution that has been used is to spin up entire environments per user session, but only when those sessions are active. From a practical sense, this has significant cold-start times compared to many other solutions, but if there is a delay between logging in and needing the compute, then perhaps the cold-start time can be hidden from the user who is spending time in the front end experience anyway. If there isn't a natural delay in the user experience, but the quality of service is important to you, then either make them wait, or specifically re-engineer the user experience such that there is enough of a delay that the service will be ready by the time the customer makes a transaction.

In this scenario, your code only needs a single queue, what changes between the environments is the connection strings.

At the end of the user session the environment can be decommissioned, or you could use it for the next customer.

This is a different implementation of the Partition concept I described in my first response but ultimately the same underlying principal. Depending on your compute needs and the criticality of your business logic, partitioning the whole environment might make sense. You might even partition the environments, and partition the queues within them and also implement shopping cart style pre-allocation combining the benefits of each different approach to optimise the user experience and throughput of transactions.

At the core of this will always be an analysis of the acceptable costs per customer/transaction as well as the development effort to get this to market at all. You might find that you start in one direction at first then as you expand you might need to implement something new.

Minimizing the customer interactions into complete batches is the best place to start, implementing pre-allocation if that makes sense in your scenario.

You can then implement partitioning of the queue processing is a pragmatic next step.

Partitioning the users into specific queues should be manageable after the previous two steps have been optimised to their practical limits.

Finally, once you need Hyper scale, partitioning across whole environment deployments should be a natural progression, knowing that the environments themselves have already been optimized.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.