Logs are overwritten in the specified index under the same _id

Hi There,

I'm using Logstash - 6.5.1 and elasticsearch - 6.5.1.

Below is my Filebeat.yml

filebeat.prospectors:

  • type: log
    paths:

    • var/log/message
      fields:
      type: apache_access
      tags: ["ApacheAccessLogs"]
  • type: log
    paths:

    • var/log/indicate
      fields:
      type: apache_error
      tags: ["ApacheErrorLogs"]
  • type: log
    paths:

    • var/log/panda
      fields:
      type: mysql_error
      tags: ["MysqlErrorLogs"]
      output.logstash:

    The Logstash hosts

    hosts: ["logstash:5044"]

Below is my logstash config file -

input {
beats {
port => 5044
tags => [ "ApacheAccessLogs", "ApacheErrorLogs", "MysqlErrorLogs" ]
}
}
filter {
if "ApacheAccessLogs" in [tags] {
grok {
match => [
"message" , "%{COMBINEDAPACHELOG}+%{GREEDYDATA:extra_fields}",
"message" , "%{COMMONAPACHELOG}+%{GREEDYDATA:extra_fields}"
]
overwrite => [ "message" ]
}
mutate {
convert => ["response", "integer"]
convert => ["bytes", "integer"]
convert => ["responsetime", "float"]
}
geoip {
source => "clientip"
target => "geoip"
add_tag => [ "apache-geoip" ]
}
date {
match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ]
remove_field => [ "timestamp" ]
}
useragent {
source => "agent"
}
}
if "ApacheErrorLogs" in [tags] {
grok {
match => { "message" => ["[%{APACHE_TIME:[apache2][error][timestamp]}] [%{LOGLEVEL:[apache2][error][level]}]( [client %{IPORHOST:[apache2][error][client]}])? %{GREEDYDATA:[apache2][error][message]}",
"[%{APACHE_TIME:[apache2][error][timestamp]}] [%{DATA:[apache2][error][module]}:%{LOGLEVEL:[apache2][error][level]}] [pid %{NUMBER:[apache2][error][pid]}(:tid %{NUMBER:[apache2][error][tid]})?]( [client %{IPORHOST:[apache2][error][client]}])? %{GREEDYDATA:[apache2][error][message1]}" ] }
pattern_definitions => {
"APACHE_TIME" => "%{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{YEAR}"
}
remove_field => "message"
}
mutate {
rename => { "[apache2][error][message1]" => "[apache2][error][message]" }
}
date {
match => [ "[apache2][error][timestamp]", "EEE MMM dd H:m:s YYYY", "EEE MMM dd H:m:s.SSSSSS YYYY" ]
remove_field => "[apache2][error][timestamp]"
}
}
if "MysqlErrorLogs" in [tags] {
grok {
match => { "message" => ["%{LOCALDATETIME:[mysql][error][timestamp]} ([%{DATA:[mysql][error][level]}] )?%{GREEDYDATA:[mysql][error][message]}",
"%{TIMESTAMP_ISO8601:[mysql][error][timestamp]} %{NUMBER:[mysql][error][thread_id]} [%{DATA:[mysql][error][level]}] %{GREEDYDATA:[mysql][error][message1]}",
"%{GREEDYDATA:[mysql][error][message2]}"] }
pattern_definitions => {
"LOCALDATETIME" => "[0-9]+ %{TIME}"
}
remove_field => "message"
}
mutate {
rename => { "[mysql][error][message1]" => "[mysql][error][message]" }
}
mutate {
rename => { "[mysql][error][message2]" => "[mysql][error][message]" }
}
date {
match => [ "[mysql][error][timestamp]", "ISO8601", "YYMMdd H:m:s" ]
remove_field => "[apache2][access][time]"
}
}
}

output {
if "ApacheAccessLogs" in [tags] {
elasticsearch { hosts => ["elasticsearch:9200"]
index => "apache"
document_type => "apacheaccess"
}
}
if "ApacheErrorLogs" in [tags] {
elasticsearch { hosts => ["elasticsearch:9200"]
index => "apache"
document_id => "apacheerror"
}
}
if "MysqlErrorLogs" in [tags] {
elasticsearch { hosts => ["elasticsearch:9200"]
index => "apache"
document_type => "sqlerror"
}
}
stdout { codec => rubydebug }
}

The data is sent to elastic search but only 3 records are getting created for each document_id in the same index.

Only 3 records are created and every new logs incoming are overwritten onto the same document_id and the old one is lost.

Can you guys please help me out? @magnusbaeck

You are specifying multiple document types for the same index which would cause errors for recent Elasticsearch versions. You also have a fixed document id specified for one output which will cause the same document to be updated repeatedly.

@Christian_Dahlqvist - What is the best way to split data, which field will help me out instead of document_type or document_id?

Also my exact output block is -

output {
if "ApacheAccessLogs" in [tags] {
elasticsearch { hosts => ["elasticsearch:9200"]
index => "apache"
document_id => "apacheaccess"
}
}
if "ApacheErrorLogs" in [tags] {
elasticsearch { hosts => ["elasticsearch:9200"]
index => "apache"
document_id => "apacheerror"
}
}
if "MysqlErrorLogs" in [tags] {
elasticsearch { hosts => ["elasticsearch:9200"]
index => "apache"
document_id => "sqlerror"
}
}
stdout { codec => rubydebug }
}

I'm using only the document_id! How can I write my output block inorder to avoid overwriting?

You can not use document id that was as it is a unique identifier for each document. Remove it and let Elasticsearch assign it.

@Christian_Dahlqvist : Thats Right... But what if there are 2 fields of the same name from 2 different sources? That will clash right?

I need to put all the data into one index & I should have another field which helps me to segragate data of one source from other 2 sources..

What is driving this requirement? You can run queries against multiple indexes. You are likely to be better off keeping different document types in different indexes. You should read about why document types are being removed from elasticsearch.

@Badger : Ok I shall use separate Index for each source.

Thanks! Specifying individual index works well!!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.