We have two aws logstash instances sending beats traffic to two s3 aws buckets using the s3 output plugin, however are seeing high outbound network traffic.
Id expect to see 1:2 inbound to outbound ratio as we are sending the data to two aws s3 buckets however we are seeing around 1:5 ratio. Is this normal behaviour for the s3 output plugin or are there any other options we can add to limit this output?
We are seeing this via our infrastructure monitoring tools and AWS portal monitoring.
Logstash Events Handled in/out are at 1:1 ratio, the instances can have been 1-2k active connections throughout the day.
Example of monitoring metrics at a specific point in time -
network bytes in 110m
network bytes out 2.3gb
network packets in 250k
network packets out 940k
The communication between Beats and Logstash is compressed by default, but when Logstash writes to S3, it will not be compressed.
I do not use this output, but from the documentation it seens that you can change the encoding to use gzip, so this could reduce the volume of outbound traffic, but the output would be gzipped.
Why is more?
As Leandro said, FB send messages in lines, unparsed+ few FB fields. LS will add few more LS fields: your transformations + raw FB message + event +ecs + s3 fields. You can try to enable gzip, and remove unnecessary fields. You can remove it:
First create index/template with all fields, test with 5-10 messages only, set the ruby debug mode, if you need LS log.level=debug and then unnecessary remove fields in LS. For unknown fields such as event check ECS.
Be aware you can lost some important fields, which will might be useful later: host, message/event, etc.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.