Logstash ingesting from S3, affecting OTHER indices?

I'm having a strange issue. I'm using the pipelines feature of Logstash via 2 config files that look like this:

input {
    s3 {
        bucket => "<BUCKETNAME>"
        region => "us-east-1"
        codec => "json"
        additional_settings => {
            force_path_style => true
            follow_redirects => false
        }
    }
}
output {
    elasticsearch {
        hosts => "http://localhost:9200"
        index => "test.<CLIENTNAME>.output-%{+YYYY.MM}"
        user => logstash_internal
        password => XXXXX
    }
}

to ingest data from S3.

However, these 2 .conf files each reference a different bucket. What I don't understand is, upon running Logstash, it seems the indices are sharing data even though each index should correspond to an S3 bucket. As one gets bigger, so does the other.

How do I keep this from happening and ensure that files from their respective buckets end up in their corresponding index?

Can you share your pipelines.yml file?

Sure thing.

It's literally just:

- pipeline.id: main
  path.config: "/etc/logstash/conf.d/*.conf"

WIth this configuration you have just one pipeline, when logstash starts it will merge all the files in the /etc/logstash/conf.d path as it was just one file.

Since you are not using conditionals in your output, the data from both inputs will be sent to all the outputs, you need to change the pipelines.yml file to use multiple pipelines.

Something like this:

- pipeline.id: pipeline-one
  path.config: "/etc/logstash/conf.d/pipeline-one.conf

- pipeline.id: pipeline-two
  path.config: "/etc/logstash/conf.d/pipeline-two.conf

This will make logstash run both pipelines, but as separated processes, the events of one pipeline won't exist to the other pipeline.

1 Like

Ahh, I had no idea that was how it worked. Thanks so much!

As a follow-up, if I wanted to re-ingest everything I already ingested, would I need to delete the sincedb files and restart logstash?

You would need to stop logstash, remove the sincedb files created by the s3 input and start it again, the sincedb file for the s3 input basically just stores the date of the last object that was read from the bucket.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.