Data from indices are shown on other indices

snobysmake · March 28, 2021, 9:39pm

I seem to be having a strange problem but quite possibly be a misconfiguration on my side.

We are trying to monitor AWS S3 logs which have the following structure

s3://bucket/projectA
s3://bucket/projectB
s3://bucket/projectC
s3://bucket/projectN
etc..

We have logstash configured for each of these projects in a different configuration file as well as some of the grok expressions are different based on the type of Load Balancer that the project has. For example we have,

/etc/logstash/conf.d/projectA.conf
/etc/logstash/conf.d/projectB.conf
/etc/logstash/conf.d/projectC.conf
/etc/logstash/conf.d/projectN.conf etc..

If I start logstash service it reads all the conf files and populate the indices but sometimes data from ProjectA is seen on ProjectC and so on.
I did a fresh start and only started one configuration file at a time and that seems to house the data to it's own indices.

Do we need to configure in a different way for a requirement like this?

warkolm · March 29, 2021, 12:03am

What does a sample Logstash conf look like?
Because Logstash will mere all of those files into one big one at run time, unless you use pipelines or something else to segregate things.

snobysmake · March 29, 2021, 1:52am

Thank you for your response. This is one of the project files under conf.d/ named projectA.conf,

input {
     s3 {
         bucket => "load-balancer-logs"
         prefix => "ProjectA"
         region => "us-west-2"
         add_field => {
              "doctype" => "aws-application-load-balancer-for-projectA"
          }
     }
  }

grok { statements }

output {
      elasticsearch {
          hosts => [ "http://localhost:9200" ]
          index => "alb-index-projectA-%{+YYYY.MM.dd}"
          #user => "user"
          #password => "password"
     }
  }

My pipelines.yml has nothing but default entries,

- pipeline.id: main
  path.config: "/etc/logstash/conf.d/*.conf"

stephenb · March 29, 2021, 2:01am

So all those conf files Are merged / concatonated i.e they are all merged into a single pipeline, so you need an if {} block to make sure you send the right docs to the right index in each of the confs.

You should do it for your filter groks too.

Something like

output {
  if [prefix] == "ProjectA" {
      elasticsearch {
          hosts => [ "http://localhost:9200" ]
          index => "alb-index-projectA-%{+YYYY.MM.dd}"
          #user => "user"
          #password => "password"
     }
   }
  }

The other way to do it us make each its own pipeline by specifically naming each separate in the pipelines.yml as separate pipelines

warkolm · March 29, 2021, 2:43am

You can also use sprintf references to make things a lot simpler;

input {
     s3 {
         bucket => "load-balancer-logs"
         prefix => "ProjectA"
         region => "us-west-2"
         add_field => {
              "doctype" => "aws-application-load-balancer-for-%{prefix}"
          }
     }
  }

grok { statements }

output {
      elasticsearch {
          hosts => [ "http://localhost:9200" ]
          index => "alb-index-%{prefix}-%{+YYYY.MM.dd}"
          #user => "user"
          #password => "password"
     }
  }

But your groks may be different.

snobysmake · March 29, 2021, 12:17pm

This is great! I opted for multiple pipelines and it seems to work.

# This file is where you define your pipelines. You can define multiple.
# For more information on multiple pipelines, see the documentation:
#   https://www.elastic.co/guide/en/logstash/current/multiple-pipelines.html

#- pipeline.id: main
#  path.config: "/etc/logstash/conf.d/*.conf"
- pipeline.id: ProjectA-pipeline
  path.config: "/etc/logstash/conf.d/ProjectA.conf"
- pipeline.id: ProjectB-pipeline
  path.config: "/etc/logstash/conf.d/ProjectB.conf"
- pipeline.id: ProjectN-pipeline
  path.config: "/etc/logstash/conf.d/ProjectN.conf"

You guys are genius!

system · April 26, 2021, 12:17pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash ingesting from S3, affecting OTHER indices? Logstash	7	275	April 22, 2022
Logstash/ElasticSearch mixing data across indexes Logstash	3	449	May 27, 2018
Creating multiple indexes with multiple s3 inputs Logstash docker	2	1363	February 29, 2020
Why logstash merge index of 3 different config? Logstash	3	283	November 9, 2022
Logs duplicated in all indices Elasticsearch	2	981	May 8, 2018

Data from indices are shown on other indices

Related topics