Index logic documentation?

I am looking for a writeup on when/how Elasticsearch decides to create new indices.

I went to list all indexes on my single node ELK stack server expecting to find exactly two: the one being fed by Logstash and the one created by Kibana. What I found was literally dozens of logstash indexes split by date, all open, and with no real rhyme nor reason to when they were created. (the kibana one was there also so at least I got that part right).

Where can I read up on this so I can stop it (or at least control it) ?

The only time Elasticsearch will create an index is if it is asked to do so.
That may be from explicit request via a mapping, or implicitly if a create request is made for an index that doesn't exist.

What does _cat/indices?v show?

That is what _cat/indices?v shows

All yellow (expected since I am single node)
All open (?)
All from logstash (they all follow the same naming convention "logstash-date")
All pri 5
All rep 1
Doc count is all over the place - as small as 11k, as large as 27k
Store size all over the place - 74mb to 105mb

Then that's what's causing them to be created.

OK so logstash is causing them to be created. Presumably each index is holding a unique subset of the overall data I actually ingest. Since the names don't give me any context clues, how do I determine which data ends up in which index? I can tell you I didn't actively ask for any indices to be created. Whatever happened was triggered by logstash at a date AFTER I plugged in the pipeline.

Better yet, how can I tell logstash to give me an index name that is actually meaningful?

You'd need to share your Logstash config I think, it'll help us understand what's going on.

The obvious guess is that a new index request happens every time the Logstash thread is restarted. Is there a way to tell Logstash NOT to ask for a new index and instead feed into the latest current index?

I am happy to post my pipeline, but it isn't very exciting.

That's unlikely.

If you can post your config it'll help immensely.

input {
    file {
        type => "bbb-web"
        path => [
            "/data/logstash/logs/*conf*{{ logstash_path_qualifier }}/bbb-web.log"
        ]
    }

    file {
        type => "freeswitch-log"
        path => [
            "/data/logstash/logs/*conf*{{ logstash_path_qualifier }}/freeswitch-log.log"
        ]
        codec => multiline {
            pattern => "%{SYSLOGTIMESTAMP} %{HOSTNAME} freeswitch-log: %{TIMESTAMP_ISO8601} "
            negate => "true"
            what => "previous"
            multiline_tag => "freeswitch_multiline"
        }
    }

    file {
        type => "freeswitch-master"
        path => [
            "/data/logstash/logs/*conf*{{ logstash_path_qualifier }}/freeswitch-master.log"
        ]
    }    
    
    file {
        type => "chatdb-mysql-audit"
        path => [
            "/data/logstash/logs/*chatdb*{{ logstash_path_qualifier }}/mysql-audit.log"
        ]
    }

    file {
        type => "confdb-mysql-audit"
        path => [
            "/data/logstash/logs/*confdb*{{ logstash_path_qualifier }}/mysql-audit.log"
        ]
    }

    file {
        type => "nginx-alb"
        path => [
            "/data/logstash/logs/*alb*{{ logstash_path_qualifier }}/nginx-access.log",
            "/data/logstash/logs/*alb*{{ logstash_path_qualifier }}/nginx-error.log"
        ]
    }

    file {
        type => "nginx-conference"
        path => [
            "/data/logstash/logs/*conf*{{ logstash_path_qualifier }}/nginx-access.log",
            "/data/logstash/logs/*conf*{{ logstash_path_qualifier }}/nginx-error.log"
        ]
    }

    file {
        type => "nginx-portal"
        path => [
            "/data/logstash/logs/*portal*{{ logstash_path_qualifier }}/nginx-access.log",
            "/data/logstash/logs/*portal*{{ logstash_path_qualifier }}/nginx-error.log"
        ]
    }

    file {
        type => "openfire-error"
        path => [
            "/data/logstash/logs/*chat*{{ logstash_path_qualifier }}/openfire-error.log"
        ]
    }

    file {
        type => "openfire-info"
        path => [
            "/data/logstash/logs/*chat*{{ logstash_path_qualifier }}/openfire-info.log"
        ]
    }

}

filter {

    if "bbb-web" in [path] {
        grok { 
            patterns_dir => ["/etc/logstash/patterns"]
            match => { "message" => "%{BBB_WEB}" }
        }    
    }

    if "freeswitch-log" in [path] {
        if "freeswitch_multiline" in [tags] {
            # If we find a multiline entry, strip out the recurring prefixes that occur mid-line
            mutate { 
                gsub => [
                    "message",
                    # NOTE - gsub does not recognize predefined grok patterns, so we have to hand enter them
                    "\n\b(?:Jan(?:uary|uar)?|Feb(?:ruary|ruar)?|M(?:a|ä)?r(?:ch|z)?|Apr(?:il)?|Ma(?:y|i)?|Jun(?:e|i)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|O(?:c|k)?t(?:ober)?|Nov(?:ember)?|De(?:c|z)(?:ember)?)\b +(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9]) (?!<[0-9])(?:2[0123]|[01]?[0-9]):(?:[0-5][0-9])(?::(?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?))(?![0-9]) \b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b) freeswitch-log: ",
                    " "
                ]
            }
        }
        grok { 
            patterns_dir => ["/etc/logstash/patterns"]
            match => { "message" => "%{FREESWITCH_LOG}" }
        }    
    }
    
    if "freeswitch-master" in [path] {
        grok { 
            patterns_dir => ["/etc/logstash/patterns"]
            match => { "message" => "%{FREESWITCH_MASTER}" }
        }    
    }

    if "mysql-audit" in [path] {
        grok { 
            patterns_dir => ["/etc/logstash/patterns"]
            match => { "message" => "%{MYSQL_AUDIT}" }
        }    
    }
    
    if "nginx-access" in [path] {
        grok { 
            patterns_dir => ["/etc/logstash/patterns"]
            match => { "message" => "%{NGINX_ACCESS}" }
        }    
    }

    if "nginx-error" in [path] {
        grok { 
            patterns_dir => ["/etc/logstash/patterns"]
            match => { "message" => "%{NGINX_ERROR}" }
        }
    }

    if "openfire-error" in [path] {
        grok { 
            patterns_dir => ["/etc/logstash/patterns"]
            match => { "message" => "%{OPENFIRE_ERROR}" }
        }    
    }

    if "openfire-info" in [path] {
        grok { 
            patterns_dir => ["/etc/logstash/patterns"]
            match => { "message" => "%{OPENFIRE_INFO}" }
        }    
    }

    if "Guest" in [full_name] { mutate { add_tag => "guest_user" } }
    if " FOO " in [full_name] { mutate { add_tag => "FOO" } }
    if " BAR " in [full_name] { mutate { add_tag => "BAR" } }
    if " BAZ " in [full_name] { mutate { add_tag => "BAZ" } }

}

output {
    elasticsearch {
        hosts => [ "localhost:9200" ]
    }
}

... and the logstash configuration proper (in case you wanted that too):

path.data: /var/lib/logstash
path.config: /etc/logstash/conf.d
config.reload.automatic: true
path.logs: /var/log/logstash

Hi,

From the elasticsearch output plugin documentation, logstash defaults to index name "logstash-%{+YYYY.MM.dd}" if not provided explicitly.

Refer to this link for more details.

Hence, if you don't want new indexes to be created every day, you will need to set the index name explicitly in the pipeline config.

output {
  elasticsearch {
    hosts => [ "localhost:9200" ]
    index => "<index_name>"
    action => "index"
  }
}

Why do you want a single big index?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.