How to find or query duplicate offsets?

I’m having duplicate records in my indexes. How can I find list of duplicate records ?

Duplicate records have same offset, can you suggest the query to find list of offset with more than one count ?

Or any other way to find duplicate records

What do you mean by offset?

log.offset field in Kibana. I see that’s few records are duplicate in Kibana with same log.offset value.

Setup breif: Filebeat - > 2 Logstash -> Elastic

#Filebeat output
output [host1:5044,host2:5044]
loadbalacer: true

This happens only for a few records like 100-500 in 1 Million. How can I fix the existing and avoid it?

You can use the fingerprint filter in Elasticsearch to create your own Elasticsearch document _id based on the timestamp and the offset, this means duplicates will update the original.

Why there is a duplication issue in the first place? And how about finding the existing duplicate data ? Can you suggest the query to find "log.offset" count > 1

Logstash tries to delivery every document at least once, in some cases it may duplicate the data and this behavior is expected.

To avoid duplicate events in elasticsearch you need to use a custom _id instead of letting elasticsearch choose the value for the _id field.

This can be done using an id of your documents, if it exists, or create one based in one or more fields of the documents using the fingerprint filter.

Check this blog post and this blog post for tips on how to deal with duplicates in Logstash and Elasticsearch.

To find the duplicate events you need to ruin an aggregation query.

Something like this:

GET your-index/_search
{
  "size": 0,
  "aggs": {
    "duplicates": {
       "terms": {
         "field": "log.offset",
         "min_doc_count": 2
       }
     }
  }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.