as per prometheus metrics, only 1 logstash pod is processing the kinesis input, we need to vertically scale the pods in this case.
can multiple logstash not work in parallel with kinesis input?
as per prometheus metrics, only 1 logstash pod is processing the kinesis input, we need to vertically scale the pods in this case.
can multiple logstash not work in parallel with kinesis input?
@deepak_deore - what is the shard count of the Kinesis data stream from which your logstash pods are consuming the messages?
there is only 1 shard
is it 1 shard == 1 logstash calculation?
Yes, that is how the Kinesis Client Library, which the Logstash Kinesis Input Plugin uses, seems to work. Please see this and give resharding (increase the shard count) an attempt. - https://docs.aws.amazon.com/streams/latest/dev/kinesis-record-processor-scaling.html
Please keep in mind that increasing the shard count has cost implications - https://aws.amazon.com/kinesis/data-streams/pricing/
thanks @Rahul_Kumar4, it worked
btw... logstash uses application_name
to save the state in dynamodb, do you know if we run 2 logstash with 1 kinesis shard but set different application_name
that way both logstash will work independently on a single kinesis shard
That would not work. Having two different application_names
would mean two separate dynamo_db
tables would be created to track the checkpointing both independent of each other, so unless you find a way to share the states of these two different tables between them, the two instances would not know how far have the records have been processed in that single shard. You may actually end up reprocessing the records twice.
You could do with just a single Logstash instance but if you are looking for scaling, you would have to increase the shard count.
These lines in that link have more context
Typically, when you use the KCL, you should ensure that the number of instances does not exceed the number of shards (except for failure standby purposes). Each shard is processed by exactly one KCL worker and has exactly one corresponding record processor, so you never need multiple instances to process one shard. However, one worker can process any number of shards, so it's fine if the number of shards exceeds the number of instances.
thanks for the info, clear now
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.