I have successfully deployed an Elastic Agent that interfaces with an Azure Event Hub. The current configuration processes approximately 5 to 10 million logs per hour, with individual log sizes ranging between 5 to 20 KB.
The system utilizes a virtual machine (VM) to run the Elastic Agent and subsequently transmit the logs to Elastic Cloud. However, I've encountered an issue where the agent does not ingest data as quickly as it's generated by the source, resulting in a consistent lag.
Upon reviewing the VM's resources, I observed that the CPU utilization is between 20-40%, and there is ample memory available. Despite these seemingly sufficient resources, the lag persists.
Could you provide guidance on how to enhance the ingestion rate of the Elastic Agent to better match the pace of data generation from the Azure Event Hub?
Beyond that, would you be able to share your integration configuration for the event hub integration as well as outline for us any additional processors or ingest pipelines you may have configured?
Can you also share a snippet from your agent log so I can see the internal metrics being generated for this input on your agent?
As a general recommendation for pub/sub integrations like event hub, we recommend employing multiple smaller nodes to scale throughout, see our recommendations for AWS s3/SQS which will apply to eventhub as well: Get the most from Elastic Agent with Amazon S3 and SQS | Elastic Blog
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.