We are ingesting logs into an AWS S3 bucket and I’m exploring various ingestion mechanisms into elasticsearch.
After some research, the options are elastic-agent using the S3/SQS and using Logstash S3 input. For the elastic-agent mechanism, we have explored adding an SQS message when new files get created in the S3 bucket. This is generally recommended due to list API calls being expensive on S3. However, codification of elastic-agents are painful from my personal experience.
Logstash seems like a good choice as it is easier to codify and maintain. However concerned about the performance of the s3 input plugin if there are large volumes of data that need to be ingested. Also Logstash has richer transformation capabilities that elastic-agent lacks.
Can someone share insight on what would be the ideal choice if ingestion performance and throughput is crucial?
Logstash s3 input performance on large buckets is very bad, there are multiple performance issues regarding this because of the lack of support for SQS.
You need to use Elastic Agent or Filebeat in combination with SQS queues, if you want the flexibility regarding transformation you can use Elastic Agent/Filebeat just to get the logs from S3 and then send them to Logstash to processing.
You can also use other tools to get the logs from S3 and send it to Logstash, like Vector from Datadog.
Processing data on large buckets requires SQS so you can scale, processing by polling files is very inneficient, and unfortunately Logstash dos not support SQS.
I have multiple data ingestion flows where I use vector to get the logs and put them on a Kafka topic and them use Logstash to process the logs consuming them from Kafka.
I agree with @leandrojmp regarding Logstash vs. Agent; S3/SQS + horizontally scaling the Agent is probably the preferred approach over Logstash.
Here is a blog that might help a bit.
Note this blog also mentions Elastic Serverless Forwarder, which is also an option (Note on that ESF will eventually be deprecated in Favor of EDOT Cloud Forwarder, see below)
The new EDOT Cloudforwarder, which currently supports only a limited number of Log Types, is also an option, especially if your logs are already of a supported type. I understand the log type support will be greatly expanded in the future.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.