Data from indices are shown on other indices

I seem to be having a strange problem but quite possibly be a misconfiguration on my side.

We are trying to monitor AWS S3 logs which have the following structure

s3://bucket/projectA
s3://bucket/projectB
s3://bucket/projectC
s3://bucket/projectN
etc..

We have logstash configured for each of these projects in a different configuration file as well as some of the grok expressions are different based on the type of Load Balancer that the project has. For example we have,

/etc/logstash/conf.d/projectA.conf
/etc/logstash/conf.d/projectB.conf
/etc/logstash/conf.d/projectC.conf
/etc/logstash/conf.d/projectN.conf etc..

If I start logstash service it reads all the conf files and populate the indices but sometimes data from ProjectA is seen on ProjectC and so on.
I did a fresh start and only started one configuration file at a time and that seems to house the data to it's own indices.

Do we need to configure in a different way for a requirement like this?

What does a sample Logstash conf look like?
Because Logstash will mere all of those files into one big one at run time, unless you use pipelines or something else to segregate things.

Thank you for your response. This is one of the project files under conf.d/ named projectA.conf,

input {
     s3 {
         bucket => "load-balancer-logs"
         prefix => "ProjectA"
         region => "us-west-2"
         add_field => {
              "doctype" => "aws-application-load-balancer-for-projectA"
          }
     }
  }

grok { statements }

output {
      elasticsearch {
          hosts => [ "http://localhost:9200" ]
          index => "alb-index-projectA-%{+YYYY.MM.dd}"
          #user => "user"
          #password => "password"
     }
  }

My pipelines.yml has nothing but default entries,

- pipeline.id: main
  path.config: "/etc/logstash/conf.d/*.conf"

So all those conf files Are merged / concatonated i.e they are all merged into a single pipeline, so you need an if {} block to make sure you send the right docs to the right index in each of the confs.

You should do it for your filter groks too.

Something like

output {
  if [prefix] == "ProjectA" {
      elasticsearch {
          hosts => [ "http://localhost:9200" ]
          index => "alb-index-projectA-%{+YYYY.MM.dd}"
          #user => "user"
          #password => "password"
     }
   }
  }

The other way to do it us make each its own pipeline by specifically naming each separate in the pipelines.yml as separate pipelines

2 Likes

You can also use sprintf references to make things a lot simpler;

input {
     s3 {
         bucket => "load-balancer-logs"
         prefix => "ProjectA"
         region => "us-west-2"
         add_field => {
              "doctype" => "aws-application-load-balancer-for-%{prefix}"
          }
     }
  }

grok { statements }

output {
      elasticsearch {
          hosts => [ "http://localhost:9200" ]
          index => "alb-index-%{prefix}-%{+YYYY.MM.dd}"
          #user => "user"
          #password => "password"
     }
  }

But your groks may be different.

2 Likes

This is great! I opted for multiple pipelines and it seems to work.

# This file is where you define your pipelines. You can define multiple.
# For more information on multiple pipelines, see the documentation:
#   https://www.elastic.co/guide/en/logstash/current/multiple-pipelines.html

#- pipeline.id: main
#  path.config: "/etc/logstash/conf.d/*.conf"
- pipeline.id: ProjectA-pipeline
  path.config: "/etc/logstash/conf.d/ProjectA.conf"
- pipeline.id: ProjectB-pipeline
  path.config: "/etc/logstash/conf.d/ProjectB.conf"
- pipeline.id: ProjectN-pipeline
  path.config: "/etc/logstash/conf.d/ProjectN.conf"

You guys are genius!

1 Like