When dealing with
- High-throughput scenarios (thousands of files per second)
- A varying schemas from Azure Event Hub
- Storing data in SQL Server
Performance, flexibility & reliability are key factors.
Recommended Approach: Azure Functions with Event Hub Trigger
Why Azure Functions with Event Hub Trigger is Better:
- Direct Integration: Azure Functions can directly connect to Event Hub using an Event Hub trigger, which eliminates the overhead of HTTP calls. This direct connection reduces latency and improves performance.
- Flexibility: Logic Apps can become complex when dealing with a variety of schemas, often requiring repeated actions. Azure Functions offer more flexibility by allowing custom code to handle different schemas effectively.
- Scalability: Event Hub can handle high throughput and provides a scalable way to ingest a large volume of events.
- Error Handling: By using Azure Functions, you can implement robust error handling and logging mechanisms to manage failed cases.
Example: (isolated worker model)
private readonly ILogger<EventHubsFunction> _logger;
public EventHubsFunction(ILogger<EventHubsFunction> logger)
{
_logger = logger;
}
[Function(nameof(EventHubFunction))]
[FixedDelayRetry(5, "00:00:10")]
[EventHubOutput("dest", Connection = "EventHubConnection")]
public string EventHubFunction(
[EventHubTrigger("src", Connection = "EventHubConnection")] string[] input,
FunctionContext context)
{
_logger.LogInformation("First Event Hubs triggered message: {msg}", input[0]);
var message = $"Output message created at {DateTime.Now}";
return message;
}
See Azure Event Hubs trigger for Azure Functions
Additionally, Azure Functions can be used in combination with Durable Functions to handle long-running processes and complex orchestrations. This approach can help with a custom retry and error-handling mechanism if required.
Microsoft has guidlnes on resilient design for Event Hubs and Azure Functions: It covers key measures to ensure robust event streaming solutions. It discusses error handling, designing for idempotency, and managing retries to handle large data volumes effectively.