Logstash as a Statefulset in Kubernetes - File Input and duplicated logs

veramendi · February 13, 2024, 11:57am

Hello,
I am trying to deploy a multiple pod logstash Statefulset on a kubernetes cluster using the Input File type.
It looks like each pod is reading the same logs from the logfile placed on a PVC, and therefore we are getting duplicated logs in our Elastic instance.
Eg: 2 pods runnning --> 2 logs with same content being posted to Elastic.

Any hints on the configuration to get this solved?

Thanks!

leandrojmp · February 13, 2024, 12:32pm

Hello and welcome,

You will need to share your Logstash configuration, you didn't share it.

You cannot have two or more different instances reading from the same log files, this will lead to duplication.

veramendi · February 13, 2024, 12:35pm

You are right.

  logstash.yml: |
    http.host: "0.0.0.0"
    path.config: /usr/share/logstash/pipeline
    queue.type: persisted
    queue.max_bytes: 1gb

  pipeline.conf: |
    input {
     file {
     path => "/var/app-logs/logs/*"
     exclude => "*.gz"
     mode => "tail"
     start_position => "end"
     codec => "json"
     sincedb_clean_after => "7"
     sincedb_path => "/var/mes-logs/logstash-sincedb"
     }
     }

leandrojmp · February 13, 2024, 12:40pm

Not sure what is the issue.

If you have 2 or more logstash instances reading the same logs, it will be duplicated.

While you can use a custom id to avoid duplication on Elasticsearch side, I would not recommend it in this case.

Use just one instance.

veramendi · February 13, 2024, 12:46pm

Then that means logstash can't be deployed as statefulset in kubernetes and make it high available with more than one pod running?
The idea of using multiple pods was to improve perfromance and to be able scale horizontally.
I can't see any info about this in the Logstash Helm Chart: helm-charts/logstash at main · elastic/helm-charts · GitHub

leandrojmp · February 13, 2024, 1:00pm

I do not use Kubernetes, but to have an HA deployment of Logstash you need third-party tools, and it also depends on your input.

For example, if you are receiving data using a TCP or UDP input, you need a load balance in front of your Logstash, then you can have as many Logstash as you want, also if you are consuming data from Kafka, you can also have multiple Logstash.

But for the file input you need to read the file and track the position read, so to have 2 or more tools doing this adds a lot more of unnecessary complexity, that's one of the reasons that you should have just one tool reading the files.

Logstash alone has no support for any kind of HA deployment.

As mentioned, it depends on the input, but also in most of the time the performance issues or bottlenecks are not on Logstash side, but on the receiving side, so scale logstash horizontally maybe not help anything and can in some scenarios make things worse.

Also as mentioned, Logstash has no support for working on HA on its own, its need third-party tools and that your data uses some specific inputs that allows load-balancing for example.

veramendi · February 13, 2024, 1:52pm

Thanks for your feedback!
I am having a current set up with HTTP input that is working with an ingress handling the load balancing, but wanted to reduce the http calls I have inside my cluster with this File input, but if i can't scale... I need to double check what works better in my case scenario.

Again, thanks for your input @leandrojmp !

system · March 12, 2024, 1:52pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash on k8s Logstash	1	561	February 19, 2018
Multiple logstash instances reading the same files Logstash	3	1586	July 6, 2017
Duplicate logs - logstash-input-s3? Logstash	9	3462	July 6, 2017
ELK state management! Logstash	6	1321	July 6, 2017
Multiple logstash servers need to be ache Beats filebeat	6	1304	November 17, 2016

Logstash as a Statefulset in Kubernetes - File Input and duplicated logs

Related topics