Indexes creation by date issue

alonsosanchezd · July 7, 2017, 8:37am

Hi all! Firstly, let me thank you all for your help!

I have an strange behaviour of filebeat/logstash and elasticsearch. Let me explain:

I have to index log files of last month and I'm forcing logstash to replace processing timestamp by event timestamp in this way:

filter {
  if [type] == "sas" {
    grok {
      match => { "message" => "%{IP:client} - - \[%{HTTPDATE:timestamp}\] \"POST /RTDM/rest/decisions/%{GREEDYDATA:tarificacion} %{DATA:protocol}\" %{NUMBER:code} %{NUMBER:bytes}" }
    }
    date {
       match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
       locale => "en"
    }
  }
}
output {
  stdout { codec => json_lines }
  elasticsearch {
    hosts => [
    "a.local:9200",
    "b.local:9200",
    "c.local:9200"
]
    sniffing => true
    index => "sas-%{+YYYY.MM.dd}"
    document_type => "%{[@metadata][type]}"
  }
}

In addition, I've configured filebeats in order to process only log files from last month:

- /opt/sas/sasconfig/Lev1/Web/WebAppServer/SASServer7_1/logs/localhost_access_log.2017-06*.txt

The fact is that I got an index per day (Great!), but an additional index of the current day (2017-07-07) is created with a lot of documents (looks like the sum of all the others)

Is there anyway to fix this, in order to get rid of the current-date index? This is the expected behaviour?

Thanks a lot in advance!

magnusbaeck · July 7, 2017, 9:42am

No, this is not expected. Do you have the configuration snippet above in a file in /etc/logstash/conf.d? Do you have any other files in that directory? Show an example of a document that ends up in the current day index.

alonsosanchezd · July 7, 2017, 11:42am

Hi Magnus! Thanks a lot for your response!!!
At this moment, I have the following files in a "pipeline" directory. The logstash process is started pointing to that directory

02-beats-input.conf 11-sas.conf 30-sas-output.conf

input {
  beats {
    port => 5044
    ssl => false
    codec => plain {
                   charset => "ISO-8859-1"
               }

  }
}

filter {
  if [type] == "sas" {
    grok {
      match => { "message" => "%{IP:client} - - \[%{HTTPDATE:timestamp}\] \"POST /RTDM/rest/decisions/%{GREEDYDATA:tarificacion} %{DATA:protocol}\" %{NUMBER:code} %{NUMBER:bytes}" }
    }
    date {
       match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
       locale => "en"
    }
  }
}

output {
  stdout { codec => json_lines }
  elasticsearch {
    hosts => [
    "a.local:9200",
    "b.local:9200",
    "c.local:9200"
]
    sniffing => true
    index => "sas-%{+YYYY.MM.dd}"
    document_type => "%{[@metadata][type]}"
  }
}

Log events are sent from machines, via filebeat, including "sas" as document type

And finally, elasticsearch creates one index per day of the last month and an additional one, for the current day.

I'm sure i'm doing something wrong, but I have no idea xD

magnusbaeck · July 12, 2017, 10:37am

Show an example of a document that ends up in the current day index.

theuntergeek · July 12, 2017, 12:51pm

Just by way of explanation, Logstash's Elasticsearch output plugin does not actually create indices. I know. It sounds confusing, right?

Logstash sends repeated batch requests of "index" requests. Each line in these bulk requests will specify the index into which is should write that event's data, in your example that will be an index named "sas-%{+YYYY.MM.dd}". Logstash derives the values for YYYY.MM.dd directly from the @timestamp field of each event as it streams by.

After that, it is up to Elasticsearch to handle the bulk requests. If the specified index does not exist, Elasticsearch will automatically create it (unless you've disabled that feature).

So, what you've described is completely normal and expected. Because filebeat does not extrapolate or convert any data, you must override the @timestamp value provided at ingest time with the converted value (which you have with your date filter).

This brings up the question, "Why am I getting that data that is going into an index with the wrong date?" The most logical explanation is that the data there is not having the @timestamp value successfully overridden by the date filter, which means that the time of ingest will be reflected in the bulk request. This will happen in the event that the grok filter and/or the date filter fail. I suggest having a look at the data in that index, which will explain what the problem is. In particular, look for a _grokparsefailure tag in the tags field.

alonsosanchezd · July 12, 2017, 3:28pm

Aaron, Magnus.... Thanks a lot for your help! I'm on holydays right now but as soon as i have a computer available, i Will give that a try

As I said before... I'm a bit confused... Because It looks like elk creates the indexes right, but creates an additional one with all the documents (or at least, many of them)

Thanks again!

alonsosanchezd · August 1, 2017, 9:13am

Finally I've found the problem:

I've include an exclude regexp in filebeat, in order to exclude certain patterns. In addition, there is the one for the grok expression, but every other entry not fitting into the grok pattern, created the index for the current date.

Sorry for bother you...

Thanks a lot!!

system · August 29, 2017, 9:13am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Krunal_kalaria · November 7, 2017, 11:02am

Hi @alonsosanchezd try this timestamp:
ISO8601_TIMEZONE
CISCOTIMESTAMP
it should be work.

Topic		Replies	Views
Logstash creating index on different dates Elasticsearch	5	814	July 5, 2017
Elastic not creating index daily based on date Logstash	13	2260	July 6, 2020
Wrong indexing of Documents using ELK Logstash	4	461	March 20, 2018
Creation Index Elasticsearch	2	351	June 23, 2018
How does logstash chose which timestamped index to use? Elasticsearch	4	546	July 6, 2017

Indexes creation by date issue

Related topics