Noob Help with duplicates

Hi Everyone,

I am seeing all messages duplicated when I search in Kibana.

Example:
Time _type MessageHeader.MessageId _id
March 15th 2017, 12:22:22.344 Rail2_chatter 2f7176cc-7bcd-4386-bed0-58ce971d90e1 AVrTa88VyOalNeB0_gFH
March 15th 2017, 12:22:22.344 Rail2_chatter 2f7176cc-7bcd-4386-bed0-58ce971d90e1 AVrTa88VyOalNeB0_gFI
March 15th 2017, 12:22:22.344 logs 2f7176cc-7bcd-4386-bed0-58ce971d90e1 AVrTa88qyOalNeB0_gFJ
March 15th 2017, 12:22:22.344 logs 2f7176cc-7bcd-4386-bed0-58ce971d90e1 AVrTa88qyOalNeB0_gFK

I have a unique field [MessageHeader][MessageId] where I only want to see that message once. These four message are the same (even the timestamps) with the exception of the _id and _type fields.

Here is the logstash config, redacted to protect the innocent servers:
input {
rabbitmq {
arguments => { 'x-ha-policy' => all }
... (stuff removed)
queue => 'logstash'
vhost => 'delphi'
}
rabbitmq {
arguments => { 'x-ha-policy' => all }
... (stuff removed)
queue => 'poshost01'
vhost => 'delphi'
}
}
filter {
if [MessageHeader] {
mutate {
replace => { "[MessageHeader][ProcessedTime]" => "%{@timestamp}" }
}
}
}
output {
if [SourceHeader][SourceType] == "Rail2" {
if [message][type] == "HEARTBEAT" {
elasticsearch {
hosts => "http://[elasticsearch]:[port]"
index => "[indexname]"
document_type => "Rail2_heartbeat"
}
}
if [message][type] == "CHATTER" {
elasticsearch {
hosts => "http://es01-lab:9200"
index => "[indexname]"
document_type => "Rail2_chatter"
}
}
}
if [SourceHeader][SourceType] == "POS.Host" {
if [message][type] == "HEARTBEAT" {
elasticsearch {
hosts => "http://[elasticsearch]:[port]"
index => "[indexname]"
document_type => "POS.Host_heartbeat"
}
}
if [message][type] == "CHATTER" {
elasticsearch {
hosts => "http://[elasticsearch]:[port]"
index => "[indexname]"
document_type => "POS.Host_chatter"
}
}
}

stdout { codec => rubydebug }
}

Looking at the config, I'm not sure where the doc type 'logs' came from, I would think we would only see the message once in the doc type 'Rail2_chatter'. Could the duplicates be because of the double input from rabbitmq?

How can I change the config to filter for unique instances of [MessageHeader][MessageId] ?

Thanks muchly!

not sure about the [message][type} if statements, as the message field is using a string, and I don't see you converting it with the [json] filter. Though I am not familiar with your data format.

I think there might be ways of simplifying your output section to be easier to read
For examole, on every Inputut I add a field called "dst_index" then in my elasticsearch output

I can do

if [dst_indes] =~ /heartbeat/

elkasticsearch{}

else

elasticsearch{
index => "%{dst_index}"

}

This allows me to treat the one heartbeat index differently

Also I like to add a tag to my input / filters and heck even outputs when debugging so I can see what block is processing the data.

hope this gives you some idea's

oh and if data is duplicated it is probably hitting multiple output statements

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.