Wrong indexing of Documents using ELK


(SRI HARSHA KOMMANA) #1

Whenever I am trying to index log data, some of the log data is being wrongly indexed into a new index. The logstash configuration file is as follows:

input {
    file {
            path => "/home/user/DATA/*.log"
            start_position => "beginning"
            #sincedb_path => "/dev/null"
    }
}

filter {
    if [message] =~ "^#" {
            drop {}
    }

    grok {
            match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp} %{WORD:serviceName} %{WORD:serverName} %{IP:serverIP} %{WORD:method} %{URIPATH:uriStem} %{NOTSPACE:uriQuery} %{NUMBER:port} %{NOTSPACE:username} %{IPORHOST:clientIP} %{NOTSPACE:protocolVersion} %{NOTSPACE:userAgent} %{NOTSPACE:cookie} %{NOTSPACE:referer} %{NOTSPACE:requestHost} %{NUMBER:response} %{NUMBER:subresponse} %{NUMBER:win32response} %{NUMBER:bytesSent} %{NUMBER:bytesReceived} %{NUMBER:timetaken}"]
    }

    date {
            match => [ "log_timestamp", "YYYY-MM-dd HH:mm:ss" ]
            timezone => "UTC"
    }

    mutate {
            convert => ["bytesSent", "integer"]
            convert => ["bytesReceived", "integer"]
            convert => ["timetaken", "integer"]

            remove_field => [ "log_timestamp", "serviceName", "serverName", "serverIP", "port", "username", "protocolVersion", "requestHost", "subresponse", "win32response"]
    }
}

output {
    elasticsearch {
            index => "log-%{+YYYY.MM.dd}"
    }
}

The contents of the data folder are as follows.

u_ex180101.log  u_ex180105.log  u_ex180109.log  u_ex180113.log  u_ex180117.log
u_ex180102.log  u_ex180106.log  u_ex180110.log  u_ex180114.log
u_ex180103.log  u_ex180107.log  u_ex180111.log  u_ex180115.log
u_ex180104.log  u_ex180108.log  u_ex180112.log  u_ex180116.log

And when I run logstash the indexes being created are :

health status index            uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   log-2018.01.12   3DHyNchJTVaSQvtAiZ6TuA   5   1    1362426            0        1gb            1gb
yellow open   log-2018.01.17   ltxUNXFJSFyJVAVz_KG_YQ   5   1    1155733            0      864mb          864mb
yellow open   log-2018.01.05   WBwmPX3nQF2_ZvhkTPZ2zA   5   1    1552917            0      1.1gb          1.1gb
yellow open   log-2018.01.10   MKJRVuPmQ4qjnuqvwu9_YA   5   1    1391996            0        1gb            1gb
yellow open   log-2018.01.04   6jpc83lURHuGI8KkZPBMSg   5   1    1489481            0      1.1gb          1.1gb
yellow open   log-2018.01.08   ownjIazZRCCl63EYC6gwwA   5   1    1379748            0        1gb            1gb
yellow open   log-2018.01.11   n5adTn1MS5SK-Jiq7LPEXg   5   1    1337156            0   1015.6mb       1015.6mb
green  open   .kibana          KPWKemzqR86CFvXAcaWv5A   1   0         30            2     48.8kb         48.8kb
yellow open   test-2018.01.17  9XHtlc9URRq-MteTiW30MA   5   1    1336650            0     1018mb         1018mb
yellow open   log-2018.02.16   Ylm_0gphQq-ywc4eHK113w   5   1    2934940            0    955.7mb        955.7mb
yellow open   log-2018.01.03   Aa6K-Y-pQMKYeCiJZvYh3A   5   1    1511993            0      1.1gb          1.1gb
yellow open   log-2018.01.02   -uF0z-2VS_2s0rhgJeTh6A   5   1     203437            0    159.8mb        159.8mb
yellow open   test2-2018.02.20 xAyQ7xjfSXOWyx0A2O8KXg   5   1          5            0     32.2kb         32.2kb
yellow open   log-2018.01.06   OH-V5lFpT4al0DnPCvA4mA   5   1    1675121            0      1.2gb          1.2gb
yellow open   log-2018.01.14   4NlViLLqRPiSd996-KSjJA   5   1    1314180            0      997mb          997mb
yellow open   test-2018.02.20  IE7moiPPTkSA4cl18NQzog   5   1          5            0     32.2kb         32.2kb
yellow open   log-2018.01.15   7Ahq-r8SREepwKEs-Uv7BA   5   1    1264421            0    968.9mb        968.9mb
yellow open   test2-2018.01.16 bg2zFOZdQmWnElWLJVSNPA   5   1     180917            0    145.4mb        145.4mb
yellow open   log-2018.01.07   KIdM2tr1QVGmSCOjMo68WQ   5   1    1366416            0        1gb            1gb
yellow open   log-2018.01.13   YTjQscXeT66a97c1YtIabQ   5   1    1338467            0   1022.8mb       1022.8mb
yellow open   test2-2018.01.17 4nEBJvXsR1GcF6rB92Hj0w   5   1    1155733            0    868.9mb        868.9mb
yellow open   log-2018.01.09   njq_Ju3QQkuJPDoOHqMXoQ   5   1    1449409            0        1gb            1gb
yellow open   log-2018.01.16   Zs8FQwCDRpKgyf87x1fODA   5   1    1312545            0    998.2mb        998.2mb

For every log file some of its contents are being indexed into a new index of current date. And u_ex180101.log's is altogether sent to a new index of current date as well. After that I did some testing and for every log file around 5 logs are being sent to a new index of current date. Why is this happening? And what do I need to do to prevent this from happening?

Thanks in advance.


(Magnus Bäck) #2

What does a document that ends up in an index named after the current date look like?


(SRI HARSHA KOMMANA) #3

@magnusbaeck

message:2018-01-02 16:16:16 W3SVC2 On-Web 172.31.26.231 GET /inoxmovies/ - 80 - 27.6.137.234 HTTP/1.1 Screaming+Frog+SEO+Spider/8.3 - - - 200 0 0 48268 264 51
@version:1 tags:_grokparsefailure
@timestamp:February 16th 2018, 17:53:27.785
host:LMINHYDTGOIN005
path:/home/DATA/u_ex180102.log
_id:n-eRnmEB5MTPjE-hxwuY
_type:doc
_index:log-2018.02.16
_score:5.131

@timestamp February 16th 2018, 17:53:27.785
t @version 1
t _id n-eRnmEB5MTPjE-hxwuY
t _index log-2018.02.16
.# _score 5.131
t _type doc
t host LMINHYDTGOIN005
t message 2018-01-02 16:16:16 W3SVC2 Web 172.31.26.231 GET /inoxmovies/ - 80 - 27.6.137.234 HTTP/1.1 Screaming+Frog+SEO+Spider/8.3 - - - 200 0 0 48268 264 51
t path /home/DATA/u_ex180102.log
t tags _grokparsefailure

Sorry I didn't see the grokparsefailure tag before. But I can't understand what's making it happen.


(SRI HARSHA KOMMANA) #4

Thanks @magnusbaeck for taking your valuable time to look into this issue.

Turns out for the 2 days that are missing in logs the servername was different and that caused the grokparse failure error. This solves that issue.

And the 5 or 6 logs that seem to have misindexed also contains some unrecognised characters in them. So logstash was throwing a grokparse error and that is the issue there. So they were being indexed with current date.


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.