Unexpected indices being created by Logstash

I'm using Logstash to ingest logs stored on AWS S3. My configuration file looks like this:

input {
        s3 {
                "access_key_id" => "REDACTED"
                "secret_access_key" => "REDACTED"
                "bucket" => "REDACTED"
                "exclude_pattern" => "^(?!lin-gitlab-gl-stage-\d+-\d+-\d+-\d+\/gitlab-workhorse\/(current|\S+.s))"
                "gzip_pattern" => "\.s?$"
                "region" => "ap-southeast-1"
                "id" => "gitlab-workhorse"
                "sincedb_path" => "/var/lib/logstash/plugins/inputs/s3/sincedb_gitlab_workhorse"
                "type" => "workhorse"
        }
        s3 {
                "access_key_id" => "REDACTED"
                "secret_access_key" => "REDACTED"
                "bucket" => "REDACTED"
                "exclude_pattern" => "^(?!lin-gitlab-gl-stage-\d+-\d+-\d+-\d+\/gitlab-rails\/api_json.log(.\d+.gz)?)"
                "gzip_pattern" => "\.gz?$"
                "region" => "ap-southeast-1"
                "id" => "gitlab-rails"
                "sincedb_path" => "/var/lib/logstash/plugins/inputs/s3/sincedb_gitlab_rails"
                "type" => "api"
        }
}

filter {
        json {
                source => "message"
        }
        if [remote_ip] == "127.0.0.1" {
                drop {}
        }
}

output {
        amazon_es {
                hosts => ["REDACTED"]
                region => "us-east-1"
                index => "stage-gitlab-%{type}-%{+YYYY.MM.dd}"
        }
}

So, given that I'm explicitly using type as part of the index name, I would expect this configuration to only create indices with names that start stage-gitlab-api and stage-gitlab-workhorse.

However, I'm seeing names where the "type" portion of the index name is "w", "ssa", "ss", "slo", "seccft" and so on, e.g. stage-gitlab-seccft-2020.11.04.

Can anyone explain why this is happening, please?

Hello Philip,

Is it possible that the type already exists in the logs? According to the docs here:

If you try to set a type on an event that already has one (for example when you send an event from a shipper to an indexer) then a new input will not override the existing type. A type set at the shipper stays with that event for its life even when sent to another Logstash server.

Best regards
Wolfram

Oh ... do you mean that "type" can be inferred by Logstash from the logs itself?

Yes. From what I understand your pipeline reads logfiles from S3 and parses them as JSON. If your JSON contains a type field in the root level your type in the input will be overridden.

What is really weird is that if I try to create an index pattern so that I can dig into this deeper, those "strange" indices don't show up:

I'll have a look at the underlying log files to see if there is a "type" attribute in there.

Normally, you can switch through the pages on the lower left of the index list:

Update:
In your screenshot there is one of the "strange" indizes:
image

That is the only one, though. The indices are listed in alphabetical order and I picked that screenshot to show that it jumps from api to auth0_managed_certs to workhorse, bypassing all of the unexpected indices.

So you see the indizes in the index list but not when creating an index pattern?

Do you have hidden indices?

Yes

Not that I'm aware of. These indices are being created by Logstash so I don't know whether they are hidden or not. Sorry - I'm very much an Elastic newbie.

I've looked at some of the log files now and none of them have "type" in them. However, I will change the configuration to use a phrase that is not under any circumstances going to appear in the files.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.