Logstash Duplicate message

Hi,

I really need your help.

I use LogStash-Forwarder on a server to send my log on another server that use LogStash.
And I Visualize all of my logs with Kibana. (Of course a use ElasticSearch for the analyze but I think is not the problem here)


  1. My first problem is that my log between my log source file and Kibana are
    duplicate. I tried many solution that I find on the web but nothing
    resolve my issue.
  2. My second issue is why my TimeStamp in Kibana is with +2hours than the TimeStamp in my log file ?

/etc/logstash-forwarder.conf

{
    "network":
    {
      "servers": [ "192.168.250.65:5000" ],
      "ssl ca": "/etc/pki/tls/certs/logstash-forwarder.crt",
      "timeout": 15
    },
  "files":
  [
    {
      "paths":
      [
        "/data/jboss/jboss-fuse-6.0.0.redhat-024/data/log/monitoring.lo*"
      ],
      "fields":
      {
        "type": "transporter"
      }
    }
  ]
}

/etc/logstash/conf.d/01-lumberjack-input.conf

input {
        lumberjack {
                port => 5000
                type => "logs"
                ssl_certificate => "/etc/pki/tls/certs/logstash-forwarder.crt"
                ssl_key => "/etc/pki/tls/private/logstash-forwarder.key"
        }
}

/etc/logstash/conf.d/11-transporter.conf

filter {
  if [type] == "transporter" {
    grok {
      match => [
         "message", "%{MESSAGE}",
         "message", "%{MESSAGE_PLAN}",
         "message", "%{MESSAGE_OLD}"
      ]
    }
  }
}

/etc/logstash/conf.d/30-lumberjack-output.conf

output {
  if [type] == "transporter" {
    elasticsearch {
      host => "localhost"
      index => "transporter-%{+YYYY.MM.dd}"
      template => "/etc/logstash/mapping/transporter.json"
      template_name => "transporter"
      template_overwrite => true
    }
  }
  else
  {
     elasticsearch { host => "localhost" }
     stdout { codec => rubydebug }
  }
}

My pattern are describe here

MESSAGE_OLD %{GREEDYDATA:log_date} \| Message:\|%{GREEDYDATA:message_type}\|:\[%{GREEDYDATA:component}]:%{GREEDYDATA:scope};%{GREEDYDATA:issue_date};%{GREEDYDATA:title};%{GREEDYDATA:edition};%{GREEDYDATA:file_name}\|%{GREEDYDATA:message_detail}
MESSAGE %{GREEDYDATA:log_date} \| Message:\|%{GREEDYDATA:message_type}\|:\[%{GREEDYDATA:component}]:%{GREEDYDATA:scope};%{GREEDYDATA:issue_date};%{GREEDYDATA:title};%{GREEDYDATA:edition};%{GREEDYDATA:file_name};%{GREEDYDATA:file_type};page:%{GREEDYDATA:page_number}\|%{GREEDYDATA:message_detail}
MESSAGE_PLAN %{GREEDYDATA:log_date} \| %{GREEDYDATA:message_type}\:%{GREEDYDATA:scope};%{GREEDYDATA:issue_date};%{GREEDYDATA:title};%{GREEDYDATA:edition};\[Pages:%{NUMBER:pages},Articles:%{NUMBER:articles},Images:%{NUMBER:images}

Example of my log

monitoring.log

22 Jul 2015 17:30:02,684 | Message:|Info|:[ComparechksProducer]:afp;;;;20150722142803_FGJ33.xml;;page:|The file not exists the entryChecksum collection.
22 Jul 2015 17:30:02,685 | Message:|Info|:[comparechks]:afp;;;;fra/20150722142803_FGJ33.xml;;page:|false
22 Jul 2015 17:30:02,711 | Message:|Info|:[ComparechksProducer]:afp;;;;20150722141409_FGI57.xml;;page:|The file not exists the entryChecksum collection.
22 Jul 2015 17:30:02,712 | Message:|Info|:[comparechks]:afp;;;;fra/20150722141409_FGI57.xml;;page:|false
22 Jul 2015 17:30:02,735 | Message:|Info|:[ComparechksProducer]:afp;;;;20150722141908_FGI88.xml;;page:|The file not exists the entryChecksum collection.
22 Jul 2015 17:30:02,736 | Message:|Info|:[comparechks]:afp;;;;fra/20150722141908_FGI88.xml;;page:|false

The same log in Kibana

My template for ElasticSearch

/etc/logstash/mapping/transporter.json

{
    "template": "transporter-*",
    "settings": {
        "index.refresh_interval": "10s",
        "index": {
            "number_of_shards": 3,
            "number_of_replicas": 1
        }
    },
    "mappings": {
        "_default_": {
            "dynamic_templates": [
                {
                    "message_field": {
                        "mapping": {
                            "index": "analyzed",
                            "omit_norms": true,
                            "type": "string"
                        },
                        "match": "message",
                        "match_mapping_type": "string"
                    }
                },
                {
                    "string_fields": {
                        "mapping": {
                            "index": "analyzed",
                            "omit_norms": true,
                            "type": "string",
                            "fields": {
                                "raw": {
                                    "ignore_above": 256,
                                    "index": "not_analyzed",
                                    "type": "string"
                                }
                            }
                        },
                        "match": "*",
                        "match_mapping_type": "string"
                    }
                }
            ],
            "_all": {
                "enabled": true,
                "omit_norms": true
            },
            "properties": {
                "@version": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "geoip": {
                    "dynamic": "true",
                    "properties": {
                        "location": {
                            "type": "geo_point"
                        }
                    }
                }
            }
        },
        "transporter": {
            "dynamic_templates": [
                {
                    "message_field": {
                        "mapping": {
                            "index": "analyzed",
                            "omit_norms": true,
                            "type": "string"
                        },
                        "match": "message",
                        "match_mapping_type": "string"
                    }
                },
                {
                    "string_fields": {
                        "mapping": {
                            "index": "analyzed",
                            "omit_norms": true,
                            "type": "string",
                            "fields": {
                                "raw": {
                                    "ignore_above": 256,
                                    "index": "not_analyzed",
                                    "type": "string"
                                }
                            }
                        },
                        "match": "*",
                        "match_mapping_type": "string"
                    }
                }
            ],
            "_all": {
                "enabled": true,
                "omit_norms": true
            },
            "properties": {
                "@timestamp": {
                    "type": "date",
                    "format": "dateOptionalTime"
                },
                "@version": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "articles": {
                    "type": "integer"
                },
                "component": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "edition": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "file": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "file_name": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "file_type": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "log_date": {
                    "type": "date",
                    "format": "dd MMM yyyy HH:mm:ss,SSS",
		    "index": "not_analyzed"
                },
                "host": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "images": {
                    "type": "integer"
                },
                "issue_date": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "message": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "message_detail": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "message_type": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "page_number": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "pages": {
                    "type": "integer"
                },
                "scope": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "title": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "type": {
                    "type": "string",
                    "index": "not_analyzed"
                }
            }
        }
    }
}

Thanks for your help and if you need others informations, don't hesitate to ask me !!

Nobody have an idea to help me ?

For duplicate messages, try using this:

filter {

  uuid {
  target => "@uuid"
  overwrite => true
  }
  fingerprint {
    source => ["message"]
    target => "fingerprint"
    key => "78787878"
    method => "SHA1"
    concatenate_sources => true
  }
  
}
output {
  elasticsearch { 
  host => localhost 
  document_id => "%{fingerprint}"
  }
  stdout { codec => rubydebug }
}

This will create a hash of your message and then if there is an exact duplicate then it will overwrite it. (In case you don't want two logs with the same contents at different times, you can use mutate{} https://www.elastic.co/guide/en/logstash/current/plugins-filters-mutate.html to remove the timestamp)

As for the timestamp, try using date {} http://stackoverflow.com/questions/26035136/logstash-custom-date-log-format-match

1 Like

I used
document_id => "%{fingerprint}"

And after I suppress this line, and now it's ok, my logs are not duplicate. I don't understand why. But if someone have this problem try with add it. (me I suppress it because it clean too much my log, and after suppress this line, It was just ok.

And for my problems with the time, it was because my log doesn't have the time zone. I'm in GMT+2 so when ElasticSearch parse the date without the TimeZone, it doesn't care about it.

I explain what is my solution, maybe it can help someone.

Thanks for your help

1 Like