Version conflict, document already exists (current version [1])

I am running metricbeat on few system.
sending that data to proxy server.
proxy then sends data to two logstash servers

logstash then parse this and stores records in Elasticsearch.
I am creating my own _id for each record. (_id = hostname+timestamp+metricsetname)
and after upgrading to 8.x and starting using data stream I am started getting this error .
( and yes I need to use my own _id )

I check quite a few time and record exist in Elastic.

but why is this happening? how to avoid it? is something wrong in my setup?
It seems like two logstash servers are receiving same record.

[2023-05-10T17:26:53,457][WARN ][logstash.outputs.elasticsearch][metricbeat_v6_7][ab9f97bc010420ba0b4c62236e64770ec3bd768c9fb6731267f70b2ee75a50ce] Failed action {:status=>409, :action=>["create", {:_id=>"red2906048_2023-05-10T16:26:54.593Z_filesystem_/s0", :_index=>"g2insight-sys-8.5.3", :routing=>nil}, {"metricset"=>{"name"=>"filesystem", "period"=>60000}, "agent"=>{"version"=>"8.5.3"}, "system"=>{"type"=>"cpu", "filesystem"=>{"free_files"=>287568367, "available"=>2934226219008, "type"=>"xfs", "used"=>{"pct"=>0.0031, "bytes"=>9089617920}, "device_name"=>"/dev/md1", "total"=>2943315836928, "free"=>2934226219008, "mount_point"=>"/s0", "files"=>287573568}}, "service"=>{"type"=>"system"}, "@version"=>"1", "tags"=>["beats_input_raw_event"], "@timestamp"=>2023-05-10T16:26:54.593Z, "host"=>{"name"=>"red2906048"}}], :response=>{"create"=>{"_index"=>".ds-insight-sys-8.5.3-2023.05.10-000237", "_id"=>"red2906048_2023-05-10T16:26:54.593Z_filesystem_/s0", "status"=>409, "error"=>{"type"=>"version_conflict_engine_exception", "reason"=>"[red2906048_2023-05-10T16:26:54.593Z_filesystem_/s0]: version conflict, document already exists (current version [1])", "index_uuid"=>"hoXYGGTYSHGlFnF-_K6h-A", "shard"=>"0", "index"=>".ds-insight-sys-8.5.3-2023.05.10-000237"}}}}

here is logstash output section

output {
  elasticsearch {
     hosts => ["host1:9200","host2:9200"]
     index => "%{[@metadata][target_index]}"
     document_id => "%{[@metadata][id]}"
     user => "${elastic_user}"
     password => "${elastic_password}"
     action => "create"
     manage_template => false
   }
}

Beats and Logstash both guarantee at least once delivery, so it's entirely possible that duplicates can happen if one part of the stack isn't assured of a previously delivery.

Data streams are append only, as per Data streams | Elasticsearch Guide [8.7] | Elastic;

Data streams are designed for use cases where existing data is rarely, if ever, updated. You cannot send update or deletion requests for existing documents directly to a data stream.

That brings us to ask if this is causing issues, or are you simply trying to clean up your logs?

Clearly your _id is not unique not sure if that is intentional or not

hmmmm :slight_smile:

That won't work... create is Append Only you are trying to update in this specific case

Is your target a data stream or index?

If data_stream update will never work, data_stream by definition are append-only / immutable

So you can only use an index...

Then you will need to use doc_as_upsert

Careful lots of traps...

@stephenb and @warkolm

These are metricbeat data and it is append only. zero update.

I am trying to figure out why I have so many such warning. my data is coming good. I can see pretty much everything. but because I have this message in thousands I want to make sure it is not something else that I am overlooking.

_id is 100% uniq because hostname + timestamp (when metricbeat client sends data) + metricset.name can't be duplicated. metricbeat is sending data every minute. and hence it will always be uniq

These 2 are mutually exclusive...
The first message says the id is not unique...

_id=>"red2906048_2023-05-10T16:26:54.593Z_filesystem_/s0

So either your _id is not unique or you are somehow sending the same data / _id multiple times

Hi
That is exactly. this record is already been pushed to elasticsearch by other logstash

and these second logstash instance is also getting this same record.

why is both logstash getting same record.

Yes I understand that because data stream it won't update and I don't want to update.
somehow I am seeing same record on both logstash is it haproxy problem?

Ok now we are getting somewhere... I don't know :slight_smile:

Are you sending the beats to more than 1 logstash?

Are you running more than 1 logstash pipeline?

Share your complete filebeat and logstash configs and perhaps we can help...

Somewhere I suspect your beats messages are getting into 2 logstash pipelines that is where you need to focus...

Are you using pipelines.yml ? share that too

Logstash pipelines in the same directory get concatenated then all messages go to all pipelines most likely it is something like that... unless you independently name them / separate them in pipelines.yml

I have few hundred system running beats. only system.yml and only cpu/network/core/filesystem etc.. just regular metric.

now they send this to haproxy1:5044

haproxy then sends this to two logstash server and both server has exact same config and running via systemctl

input {
    beats {
        port => "5044"
    }
}

and output I already listed up there in initial post. but here it is.

output {
  elasticsearch {
     hosts => ["host1:9200","host2:9200"]
     index => "%{[@metadata][target_index]}"
     document_id => "%{[@metadata][id]}"
     user => "${elastic_user}"
     password => "${elastic_password}"
     action => "create"
     manage_template => false
   }
}

This is how I create _id

mutate { add_field => { "[@metadata][id]" => "%{[host][name]}_%{@timestamp}_%{[metricset][name]}" } }

What is happening here is that one logstash somehow getting a record, process it and saves in Elastic.
and second logstash is also receiving same record.

This is not happening for all the record. only to some. and there is no pattern. sometime it is for one system, different metric and or different system no fix pattern that I can see

here is haproxy setup

frontend logstash_metricbeat
        bind myproxy_server:5044
        mode tcp
        timeout client      120s
        default_backend logstash_metricbeat
backend logstash_metricbeat
        mode tcp
        timeout server      100s
        timeout connect     100s
        timeout queue       100s
        balance leastconn
        server logstash_server1 <ipaddr>:5044 check weight 1
        server logstash_server2 <ipaddr>:5044 check weight 1

I know what you keep telling me... :slight_smile: but I we have to ask questions to help.

I asked the other questions because we often see folks with more than 1 logstash config in a single directory which then are concatenated together which results in duplications...

It looks to me like your load balancer / proxy etc is sending events to more than 1 logstash ... nothing we can really help you with .... that is a LB / Proxy issue which we (I) am not an expert with.

I suspect if you tested with a single logstash no LB / Proxy you would not have the issue.

How do you know that? If every event is sent to both logstash instances then it is entirely possible that most will get written to elasticsearch twice, setting the [version] of the document to 2.

Indexing an event in elasticsearch is a two-phase transaction. You only get that exception if the the phases for the indexing of two documents with the same id overlap. Otherwise one simply overwrites the other.

Do you see documents in elasticsearch with version 2?

He using a data stream which is Create / Append Only there should be no version 2 ... If there is that is a whole other issue. :slight_smile:

@Badger no record with version 2

reason I am using multiple logstash is because large volume of records. one is not able to handle. we did test that in beginning ( few issue with open files, memory overrun, etc..) after multiple logstash that problem is gone.

reason I know only few records are being sent to both because in logstash-plain.log file I get few hundred message compare to many thousands total record.

I picked many random warning message and check them against reocrd that are in Elastic. and for sure they are there. That means this warnings are legit.

Thank you guys for keeping up with me.

Let me also start looking in to haproxy side and see if there is any setting about it.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.