Enhancing Ingestion Rate of Elastic Agent Reading from Azure Event Hub

I have successfully deployed an Elastic Agent that interfaces with an Azure Event Hub. The current configuration processes approximately 5 to 10 million logs per hour, with individual log sizes ranging between 5 to 20 KB.

The system utilizes a virtual machine (VM) to run the Elastic Agent and subsequently transmit the logs to Elastic Cloud. However, I've encountered an issue where the agent does not ingest data as quickly as it's generated by the source, resulting in a consistent lag.

Upon reviewing the VM's resources, I observed that the CPU utilization is between 20-40%, and there is ample memory available. Despite these seemingly sufficient resources, the lag persists.

Could you provide guidance on how to enhance the ingestion rate of the Elastic Agent to better match the pace of data generation from the Azure Event Hub?

The first thing to try would be to make sure you're running at least 8.12 Agent and to switch that particular agent to using the throughput preset see: Using Elastic Agent Performance Presets in 8.12 | Elastic Blog

Beyond that, would you be able to share your integration configuration for the event hub integration as well as outline for us any additional processors or ingest pipelines you may have configured?

Can you also share a snippet from your agent log so I can see the internal metrics being generated for this input on your agent?

As a general recommendation for pub/sub integrations like event hub, we recommend employing multiple smaller nodes to scale throughout, see our recommendations for AWS s3/SQS which will apply to eventhub as well: Get the most from Elastic Agent with Amazon S3 and SQS | Elastic Blog

Besides what @strawgate mentioned, you may need to check on Azure side if there is any throttling.

Azure may throttle your requests depending on some factors, like event rate, and event size and the number of TUs you have.

Hi @strawgate - We are using Elastic Agent 8.17 version and below are the performance tuning we have applied

bulk_max_size: 4096
worker: 16
queue.mem.events: 131072
queue.mem.flush.min_events: 4096
queue.mem.flush.timeout: 5s
compression_level: 1
idle_connection_timeout: 15s

And also we have an ingest pipeline used to parse the logs (Contains some grok patterns, and json parsing as well)

below are the metrics of the elastic agent