I'm seeing significant performance impacts when using codec => "json" with the Kinesis input plugin and wanted to see if anyone has recommendations or suggestions for speeding up the processing.
On a 4 CPU 16 GB box running RedHat with the xmx and xms set to 8 GB, the following config is able to process about 50,000+ per minute 4 workers and a batch size of 10000 into a 8 node ElasticSearch cluster. This is without the code setting.
input {
kinesis {
kinesis_stream_name => "kinesis-stream"
application_name => "logstash-kinesis-poc"
region => "us-east-1"
profile => "default"
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
hosts => ["http://elastic_box_one:9200","http://elastic_box_two:9200"]
doc_as_upsert => true
template_overwrite => false
index => "kinesis-poc-%{+YYYY-MM-dd}"
template_name => "poc"
}
}
When the config has type and codec added the performance drops to 2100+ per minute at 4 workers and batch of 10,000.
input {
kinesis {
kinesis_stream_name => "kinesis-stream"
application_name => "logstash-kinesis-poc"
region => "us-east-1"
profile => "default"
type => "message"
codec => "json"
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
hosts => ["http://elastic_box_one:9200","http://elastic_box_two:9200"]
doc_as_upsert => true
template_overwrite => false
index => "kinesis-poc-%{+YYYY-MM-dd}"
template_name => "poc"
}
}
Performance increases to about 5000 per minutes when the workers are left at 4 and the batch is reduced to 100.
We also tried using filter to set the json source, but that had the same level of performance decrease as using codec => "json" in the input.
filter {
json {
source => "message"
}
}