Logstash ingesting from S3, affecting OTHER indices?

pritster5 · March 25, 2022, 8:14pm

I'm having a strange issue. I'm using the pipelines feature of Logstash via 2 config files that look like this:

input {
    s3 {
        bucket => "<BUCKETNAME>"
        region => "us-east-1"
        codec => "json"
        additional_settings => {
            force_path_style => true
            follow_redirects => false
        }
    }
}
output {
    elasticsearch {
        hosts => "http://localhost:9200"
        index => "test.<CLIENTNAME>.output-%{+YYYY.MM}"
        user => logstash_internal
        password => XXXXX
    }
}

to ingest data from S3.

However, these 2 .conf files each reference a different bucket. What I don't understand is, upon running Logstash, it seems the indices are sharing data even though each index should correspond to an S3 bucket. As one gets bigger, so does the other.

How do I keep this from happening and ensure that files from their respective buckets end up in their corresponding index?

leandrojmp · March 25, 2022, 8:19pm

Can you share your pipelines.yml file?

pritster5 · March 25, 2022, 8:20pm

Sure thing.

It's literally just:

- pipeline.id: main
  path.config: "/etc/logstash/conf.d/*.conf"

leandrojmp · March 25, 2022, 8:24pm

WIth this configuration you have just one pipeline, when logstash starts it will merge all the files in the /etc/logstash/conf.d path as it was just one file.

Since you are not using conditionals in your output, the data from both inputs will be sent to all the outputs, you need to change the pipelines.yml file to use multiple pipelines.

Something like this:

- pipeline.id: pipeline-one
  path.config: "/etc/logstash/conf.d/pipeline-one.conf

- pipeline.id: pipeline-two
  path.config: "/etc/logstash/conf.d/pipeline-two.conf

This will make logstash run both pipelines, but as separated processes, the events of one pipeline won't exist to the other pipeline.

pritster5 · March 25, 2022, 8:32pm

Ahh, I had no idea that was how it worked. Thanks so much!

pritster5 · March 25, 2022, 8:33pm

As a follow-up, if I wanted to re-ingest everything I already ingested, would I need to delete the sincedb files and restart logstash?

leandrojmp · March 25, 2022, 8:36pm

You would need to stop logstash, remove the sincedb files created by the s3 input and start it again, the sincedb file for the s3 input basically just stores the date of the last object that was read from the bucket.

system · April 22, 2022, 8:36pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Creating multiple indexes with multiple s3 inputs Logstash docker	2	1387	February 29, 2020
How to run multiple logstash instances for s3 input Logstash	7	5146	July 6, 2017
Data from indices are shown on other indices Elasticsearch	6	334	April 26, 2021
Two indexes receiving same content regardless of logstash conf files Logstash	4	580	June 3, 2019
Logstash multiple pipelines going into same index Logstash	3	2368	May 13, 2018

Logstash ingesting from S3, affecting OTHER indices?

Related topics