Wrong indexing of Documents using ELK

Whenever I am trying to index log data, some of the log data is being wrongly indexed into a new index. The logstash configuration file is as follows:

input {
    file {
            path => "/home/user/DATA/*.log"
            start_position => "beginning"
            #sincedb_path => "/dev/null"
    }
}

filter {
    if [message] =~ "^#" {
            drop {}
    }

    grok {
            match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{WORD:serviceName} %{WORD:serverName} %{IP:serverIP} %{WORD:method} %{URIPATH:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:protocolVersion} %{NOTSPACE:userAgent} %{NOTSPACE:cookie} %{NOTSPACE:referer} %{NOTSPACE:requestHost} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:bytesSent} %{NUMBER:bytesReceived} %{NUMBER:timetaken}"]
    }

    date {
            match => [ "log_timestamp", "YYYY-MM-dd HH:mm:ss" ]
            timezone => "UTC"
    }

    mutate {
            convert => ["bytesSent", "integer"]
            convert => ["bytesReceived", "integer"]
            convert => ["timetaken", "integer"]

            remove_field => [ "log_timestamp", "serviceName", "serverName", "serverIP", "port", "username", "protocolVersion", "requestHost", "subresponse", "win32response"]
    }
}

output {
    elasticsearch {
            index => "log-%{+YYYY.MM.dd}"
    }
}

The contents of the data folder are as follows.

u_ex180101.log  u_ex180105.log  u_ex180109.log  u_ex180113.log  u_ex180117.log
u_ex180102.log  u_ex180106.log  u_ex180110.log  u_ex180114.log
u_ex180103.log  u_ex180107.log  u_ex180111.log  u_ex180115.log
u_ex180104.log  u_ex180108.log  u_ex180112.log  u_ex180116.log

And when I run logstash the indexes being created are :

health status index            uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   log-2018.01.12   3DHyNchJTVaSQvtAiZ6TuA   5   1    1362426            0        1gb            1gb
yellow open   log-2018.01.17   ltxUNXFJSFyJVAVz_KG_YQ   5   1    1155733            0      864mb          864mb
yellow open   log-2018.01.05   WBwmPX3nQF2_ZvhkTPZ2zA   5   1    1552917            0      1.1gb          1.1gb
yellow open   log-2018.01.10   MKJRVuPmQ4qjnuqvwu9_YA   5   1    1391996            0        1gb            1gb
yellow open   log-2018.01.04   6jpc83lURHuGI8KkZPBMSg   5   1    1489481            0      1.1gb          1.1gb
yellow open   log-2018.01.08   ownjIazZRCCl63EYC6gwwA   5   1    1379748            0        1gb            1gb
yellow open   log-2018.01.11   n5adTn1MS5SK-Jiq7LPEXg   5   1    1337156            0   1015.6mb       1015.6mb
green  open   .kibana          KPWKemzqR86CFvXAcaWv5A   1   0         30            2     48.8kb         48.8kb
yellow open   test-2018.01.17  9XHtlc9URRq-MteTiW30MA   5   1    1336650            0     1018mb         1018mb
yellow open   log-2018.02.16   Ylm_0gphQq-ywc4eHK113w   5   1    2934940            0    955.7mb        955.7mb
yellow open   log-2018.01.03   Aa6K-Y-pQMKYeCiJZvYh3A   5   1    1511993            0      1.1gb          1.1gb
yellow open   log-2018.01.02   -uF0z-2VS_2s0rhgJeTh6A   5   1     203437            0    159.8mb        159.8mb
yellow open   test2-2018.02.20 xAyQ7xjfSXOWyx0A2O8KXg   5   1          5            0     32.2kb         32.2kb
yellow open   log-2018.01.06   OH-V5lFpT4al0DnPCvA4mA   5   1    1675121            0      1.2gb          1.2gb
yellow open   log-2018.01.14   4NlViLLqRPiSd996-KSjJA   5   1    1314180            0      997mb          997mb
yellow open   test-2018.02.20  IE7moiPPTkSA4cl18NQzog   5   1          5            0     32.2kb         32.2kb
yellow open   log-2018.01.15   7Ahq-r8SREepwKEs-Uv7BA   5   1    1264421            0    968.9mb        968.9mb
yellow open   test2-2018.01.16 bg2zFOZdQmWnElWLJVSNPA   5   1     180917            0    145.4mb        145.4mb
yellow open   log-2018.01.07   KIdM2tr1QVGmSCOjMo68WQ   5   1    1366416            0        1gb            1gb
yellow open   log-2018.01.13   YTjQscXeT66a97c1YtIabQ   5   1    1338467            0   1022.8mb       1022.8mb
yellow open   test2-2018.01.17 4nEBJvXsR1GcF6rB92Hj0w   5   1    1155733            0    868.9mb        868.9mb
yellow open   log-2018.01.09   njq_Ju3QQkuJPDoOHqMXoQ   5   1    1449409            0        1gb            1gb
yellow open   log-2018.01.16   Zs8FQwCDRpKgyf87x1fODA   5   1    1312545            0    998.2mb        998.2mb

For every log file some of its contents are being indexed into a new index of current date. And u_ex180101.log's is altogether sent to a new index of current date as well. After that I did some testing and for every log file around 5 logs are being sent to a new index of current date. Why is this happening? And what do I need to do to prevent this from happening?

Thanks in advance.

What does a document that ends up in an index named after the current date look like?

1 Like

@magnusbaeck

message:2018-01-02 16:16:16 W3SVC2 On-Web 172.31.26.231 GET /inoxmovies/ - 80 - 27.6.137.234 HTTP/1.1 Screaming+Frog+SEO+Spider/8.3 - - - 200 0 0 48268 264 51
@version:1 tags:_grokparsefailure
@timestamp:February 16th 2018, 17:53:27.785
host:LMINHYDTGOIN005
path:/home/DATA/u_ex180102.log
_id:n-eRnmEB5MTPjE-hxwuY
_type:doc
_index:log-2018.02.16
_score:5.131

@timestamp February 16th 2018, 17:53:27.785
t @version 1
t _id n-eRnmEB5MTPjE-hxwuY
t _index log-2018.02.16
.# _score 5.131
t _type doc
t host LMINHYDTGOIN005
t message 2018-01-02 16:16:16 W3SVC2 Web 172.31.26.231 GET /inoxmovies/ - 80 - 27.6.137.234 HTTP/1.1 Screaming+Frog+SEO+Spider/8.3 - - - 200 0 0 48268 264 51
t path /home/DATA/u_ex180102.log
t tags _grokparsefailure

Sorry I didn't see the grokparsefailure tag before. But I can't understand what's making it happen.

Thanks @magnusbaeck for taking your valuable time to look into this issue.

Turns out for the 2 days that are missing in logs the servername was different and that caused the grokparse failure error. This solves that issue.

And the 5 or 6 logs that seem to have misindexed also contains some unrecognised characters in them. So logstash was throwing a grokparse error and that is the issue there. So they were being indexed with current date.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.