Duplicate documents in Elasticsearch?

I am sending one document to Elastic 5.5 from Logstash. But I am getting duplicate documents, two documents with different IDs but same fields. Below I am showing only index , type and id.

"hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "logstash-2017.09.01",
        "_type": "threat",
        "_id": "AV4-7XhNlO-DC0yfkKOf",
        "_score": 1
      },
      {
        "_index": "logstash-2017.09.01",
        "_type": "threat",
        "_id": "AV4-7XhklO-DC0yfkKOg",
        "_score": 1
      }
    ]
  }

Can you share your configuration?

Sure, this is only one node, see below.

{
"logstash-2017.09.01": {
"settings": {
"index": {
"creation_date": "1504278620020",
"number_of_shards": "5",
"number_of_replicas": "1",
"uuid": "Bz6r4KcBRDWxwgsTDU5Epg",
"version": {
"created": "5050299"
},
"provided_name": "logstash-2017.09.01"
}
}
}
}

In fact I am using popular github.com/deviantony/docker-elk, vanilla config (easy to recreate dokcer containers)

version: '2'

services:

  elasticsearch:
    build: elasticsearch/
    volumes:
      - ./elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml
    ports:
      - "9200:9200"
      - "9300:9300"
    environment:
      ES_JAVA_OPTS: "-Xmx256m -Xms256m"
    networks:
      - elk

  logstash:
    build: logstash/
    volumes:
      - ./logstash/config/logstash.yml:/usr/share/logstash/config/logstash.yml
      - ./logstash/pipeline:/usr/share/logstash/pipeline
    ports:
      - "11514:11514/udp"
    environment:
      LS_JAVA_OPTS: "-Xmx256m -Xms256m"
    networks:
      - elk
    depends_on:
      - elasticsearch

  kibana:
    build: kibana/
    volumes:
      - ./kibana/config/:/usr/share/kibana/config
    ports:
      - "5601:5601"
    networks:
      - elk
    depends_on:
      - elasticsearch

networks:

  elk:
    driver: bridge

Just to prove the documents are identical:

$ curl -XGET '10.254.253.100:9200/logstash-2017.09.01/_search?q=_id:"AV4-7XhklO-DC0yfkKOg"&pretty' > es1
$ curl -XGET '10.254.253.100:9200/logstash-2017.09.01/_search?q=_id:"AV4-7XhNlO-DC0yfkKOf"&pretty' > es2
$ diff es1 es2
16c16
<         "_id" : "AV4-7XhklO-DC0yfkKOg",
---
>         "_id" : "AV4-7XhNlO-DC0yfkKOf",

and this is not a logstash problem because I am loggoing to stdout and getting only one there

output {
        if [type] == "threat" {
               
                elasticsearch { 
                        hosts => ["10.254.253.100:9200"] 
                        manage_template => false

                }
                stdout { codec => rubydebug }
        }
        
        
}

$ docker logs dockerelk_logstash_1 | grep 8.8.8.8

               "SourceIP" => "8.8.8.8",

What does the entire Logstash config look like?

I added uuif fingerprinting to logstash and now the problem disappeared

filter {
        if [type] == "threat" {

                
               fingerprint {
                   target => "%{[@metadata][uuid]}"
                   method => "UUID"
               }
               
       }
       
}    
output {
            if [type] == "threat" {
                   
                    elasticsearch { 
                            hosts => ["10.254.253.100:9200"] 
                            manage_template => false
                            document_id => "%{[@metadata][uuid]}"
                            index => "debug"
                            template_name => "debug"

                    }
                    stdout { codec => rubydebug }
            }     
    }

However it is strange because id is like old one i.e. "_id": "AV5DJ6A8Ap8QZuUUMBEC" but meta-data shows different. I expetced uudi to replace id

"%{": {
            "@metadata": {
              "uuid": {
                "}": "46a45cc5-365d-40c5-af89-105d58bc6160"
              }
            }
          }

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.