Duplicate record exatly twice

david.derosa · March 22, 2018, 4:37pm

Hi all,
i use elastisearch 6.2.2 in a really basic test environment (no cluster, single node).

I've just realized that all records are duplicated exactly twice. The two records have different _id but all other fields are the same.

Only a short example:
record 1
@timestamp March 22nd 2018, 17:17:20.700
@version 1
_id HEGATmIBZYl-EB49OMRc
_index logstash-2018.03.22
message Connection lost after 0 seconds

record 2
@timestamp March 22nd 2018, 17:17:20.700
@version 1
_id JUGATmIBZYl-EB49OsRI
_index logstash-2018.03.22
message Connection lost after 0 seconds

I configured logstash in debug mode but i didn't see any duplicate data.

What could create this situation?

thank you all.

dadoonet · March 22, 2018, 7:09pm

May be you ran twice logstash?

david.derosa · March 22, 2018, 7:18pm

There is only one logstash instance (one pid).

what can i do to discover where the problem came from?

dadoonet · March 22, 2018, 8:04pm

Elasticsearch does not duplicate documents.
It can only happen if logstash is sending twice the documents.

May be a bug but I think you should see some traces in logstash logs or something you did wrong.

May be share your logstash configuration file?

I moved your question to #logstash

david.derosa · March 23, 2018, 4:02pm

I tried to add a second elasticsearch ouput in logstash and the data is not duplicate in this one but the data is always double in the first elasticsearch.

This is my logstash.conf:

input {
  udp {
    host => "127.0.0.1"
    port => 10514
    codec => "json"
    type => "rsyslog"
  }
}

# This is an empty filter block.  You can later add other filters here to further process
# your log lines
#https://gist.github.com/mesimeris/bf6cd912d11b674c4a2b

filter { 
	if [sysloghost] == "192.168.0.1" {
		grok {
			patterns_dir => "/etc/logstash/grok/mikrotik.pattern"
			match => { "message" => "%{MIKROTIKFIREWALL}" }
		}
	}
}

# This output block will send all events of type "rsyslog" to Elasticsearch at the configured
# host and port into daily indices of the pattern, "rsyslog-YYYY.MM.DD"

output {
 if "_grokparsefailure" in [tags] {
    # write events that didn't match to a file
    file { "path" => "/tmp/grok_failures.txt" }
  } else {
    if [type] == "rsyslog" {
	elasticsearch { hosts => [ "127.0.0.1:9200" ] }
	elasticsearch { hosts => [ "192.168.0.122:9200" ] }
	stdout { codec => rubydebug }
    }
  }
}

Christian_Dahlqvist · March 23, 2018, 4:13pm

Logstash concatenates config files it finds in its directory. Do you by any chance have any other config files that specify the same Elasticsearch output?

david.derosa · March 23, 2018, 4:27pm

Yes, i have other two logstash configs with the same output but different inputs and filters.
I add the second elastichsearch to second logstash config. The data in always not duplicate on the second and duplicate on the first.

input {
       # this is the actual live log file to monitor
       #file {
       #       path => ["/home/cowrie/cowrie-git/log/cowrie.json"]
       #       codec => json
       #       type => "cowrie"
       #}
       # this is to send old logs to for reprocessing
       #tcp {
       #       port => 3333
       #       type => "cowrie"
       #}
       beats {
		port => 5044
		type => "cowrie"
	}
}

filter {
    if [type] == "cowrie" {

        json {
            source => message
        }

        date {
            match => [ "timestamp", "ISO8601" ]
        }

	mutate {
		convert => { "dst_port" => "integer" }
		convert => { "src_port" => "integer" }
        }

        if [src_ip]  {

            mutate {
                add_field => { "src_host" => "%{src_ip}" }
            }

            dns {
                reverse => [ "src_host" ]

                nameserver => [ "192.168.0.1" ]

                action => "replace"

                hit_cache_size => 4096
                hit_cache_ttl => 900
                failed_cache_size => 512
                failed_cache_ttl => 900
            }


            geoip {
                source => "src_ip"
                target => "geoip"
                #database => "/opt/logstash/vendor/geoip/GeoLite2-City.dat"
		database => "/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-filter-geoip-5.0.3-java/vendor/GeoLite2-City.mmdb"
            }

        }
        
        mutate {
            remove_tag => [ "beats_input_codec_plain_applied"]

            # cut useless fields added by filebeat, if you use it of course
            remove_field => [ "source", "offset", "input_type" ]
        }
    }
}

output {
    if [type] == "cowrie" {
        elasticsearch {
            hosts => ["localhost:9200"]
        }
	elasticsearch { hosts => [ "192.168.0.122:9200" ] }
#        file {
#            path => "/tmp/cowrie-logstash.log"
#            codec => json
#        }
#        stdout {
#            codec => rubydebug
#        }
    }
}

david.derosa · March 23, 2018, 7:06pm

Thank you all for your support and tips.
I found the configuration error in my last logstash conf file.
The line of "if type" was commented in the output.
It's a stupid error but not so easy to see.

system · April 20, 2018, 7:06pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Duplicate Events Logstash	3	1885	July 6, 2017
Found duplicate records in elasticsearch Logstash	8	2523	December 25, 2017
Duplicate logs Elasticsearch	14	6888	July 10, 2018
Logstash duplicating log events? Logstash	4	1761	July 6, 2017
How to avoid elasticsearch duplicate documents Logstash	6	1713	March 5, 2018

Duplicate record exatly twice

Related topics