Indexes creation by date issue

Hi all! Firstly, let me thank you all for your help!

I have an strange behaviour of filebeat/logstash and elasticsearch. Let me explain:

I have to index log files of last month and I'm forcing logstash to replace processing timestamp by event timestamp in this way:

filter {
  if [type] == "sas" {
    grok {
      match => { "message" => "%{IP:client} - - \[%{HTTPDATE:timestamp}\] \"POST /RTDM/rest/decisions/%{GREEDYDATA:tarificacion} %{DATA:protocol}\" %{NUMBER:code} %{NUMBER:bytes}" }
    }
    date {
       match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
       locale => "en"
    }
  }
}
output {
  stdout { codec => json_lines }
  elasticsearch {
    hosts => [
    "a.local:9200",
    "b.local:9200",
    "c.local:9200"
]
    sniffing => true
    index => "sas-%{+YYYY.MM.dd}"
    document_type => "%{[@metadata][type]}"
  }
}

In addition, I've configured filebeats in order to process only log files from last month:

- /opt/sas/sasconfig/Lev1/Web/WebAppServer/SASServer7_1/logs/localhost_access_log.2017-06*.txt

The fact is that I got an index per day (Great!), but an additional index of the current day (2017-07-07) is created with a lot of documents (looks like the sum of all the others)

Is there anyway to fix this, in order to get rid of the current-date index? This is the expected behaviour?

Thanks a lot in advance!

No, this is not expected. Do you have the configuration snippet above in a file in /etc/logstash/conf.d? Do you have any other files in that directory? Show an example of a document that ends up in the current day index.

Hi Magnus! Thanks a lot for your response!!!
At this moment, I have the following files in a "pipeline" directory. The logstash process is started pointing to that directory

02-beats-input.conf 11-sas.conf 30-sas-output.conf

input {
  beats {
    port => 5044
    ssl => false
    codec => plain {
                   charset => "ISO-8859-1"
               }

  }
}

filter {
  if [type] == "sas" {
    grok {
      match => { "message" => "%{IP:client} - - \[%{HTTPDATE:timestamp}\] \"POST /RTDM/rest/decisions/%{GREEDYDATA:tarificacion} %{DATA:protocol}\" %{NUMBER:code} %{NUMBER:bytes}" }
    }
    date {
       match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
       locale => "en"
    }
  }
}

output {
  stdout { codec => json_lines }
  elasticsearch {
    hosts => [
    "a.local:9200",
    "b.local:9200",
    "c.local:9200"
]
    sniffing => true
    index => "sas-%{+YYYY.MM.dd}"
    document_type => "%{[@metadata][type]}"
  }
}

Log events are sent from machines, via filebeat, including "sas" as document type

And finally, elasticsearch creates one index per day of the last month and an additional one, for the current day.

I'm sure i'm doing something wrong, but I have no idea xD

Show an example of a document that ends up in the current day index.

Just by way of explanation, Logstash's Elasticsearch output plugin does not actually create indices. I know. It sounds confusing, right?

Logstash sends repeated batch requests of "index" requests. Each line in these bulk requests will specify the index into which is should write that event's data, in your example that will be an index named "sas-%{+YYYY.MM.dd}". Logstash derives the values for YYYY.MM.dd directly from the @timestamp field of each event as it streams by.

After that, it is up to Elasticsearch to handle the bulk requests. If the specified index does not exist, Elasticsearch will automatically create it (unless you've disabled that feature).

So, what you've described is completely normal and expected. Because filebeat does not extrapolate or convert any data, you must override the @timestamp value provided at ingest time with the converted value (which you have with your date filter).

This brings up the question, "Why am I getting that data that is going into an index with the wrong date?" The most logical explanation is that the data there is not having the @timestamp value successfully overridden by the date filter, which means that the time of ingest will be reflected in the bulk request. This will happen in the event that the grok filter and/or the date filter fail. I suggest having a look at the data in that index, which will explain what the problem is. In particular, look for a _grokparsefailure tag in the tags field.

1 Like

Aaron, Magnus.... Thanks a lot for your help! I'm on holydays right now but as soon as i have a computer available, i Will give that a try

As I said before... I'm a bit confused... Because It looks like elk creates the indexes right, but creates an additional one with all the documents (or at least, many of them)

Thanks again!

Finally I've found the problem:

I've include an exclude regexp in filebeat, in order to exclude certain patterns. In addition, there is the one for the grok expression, but every other entry not fitting into the grok pattern, created the index for the current date.

Sorry for bother you... :frowning:

Thanks a lot!!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Hi @alonsosanchezd try this timestamp:
ISO8601_TIMEZONE
CISCOTIMESTAMP
it should be work.