NOTE: After doing additional research and managing to find the code preventing updates, I've revised this post to be more accurate and succinct.
ISSUE: When using the beat.Client
to publish [bulk] beat.Event
s to the Elasticsearch output, the client does not allow updates by way of including the document id in the Meta
map.
Sample code:
client.Publish(beat.Event{
Fields: common.MapStr{
"field1": "abc",
},
Timestamp: time.Now(),
Meta: common.MapStr{
"id": "123",
},
})
The first time the document is published, it is successfully created. The second time the document is published (with updated values), an error is generated indicating "version conflict, document already exists".
I've tracked it down to this code in the libbeat/outputs/elasticsearch/client.go
From version 6.4 of github.com/elastic/beats/libbeat/outputs/elasticsearch/client.go
417 if id != "" {
418 return bulkCreateAction{meta}, nil
419 }
420 return bulkIndexAction{meta}, nil
If an id is provided, the client always sends a 'create' action. If I comment out lines 417-419, I'm able to do document updates by including the document id in the Meta
map and letting the bulk API decide whether to create the document or update it in the 'index' action as it does using the rest calls below.
POST _bulk
{ "index" : { "_index" : "my-test-index", "_type" : "doc", "_id" : "123" } }
{ "field1": "abc" }
{
"took": 1650,
"errors": false,
"items": [
{
"index": {
"_index": "my-test-index",
"_type": "doc",
"_id": "123",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1,
"status": 201
}
}
]
}
POST _bulk
{ "index" : { "_index" : "my-test-index", "_type" : "doc", "_id" : "123" } }
{ "field1": "xyz" }
{
"took": 76,
"errors": false,
"items": [
{
"index": {
"_index": "my-test-index",
"_type": "doc",
"_id": "123",
"_version": 2,
"result": "updated",
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"_seq_no": 1,
"_primary_term": 1,
"status": 200
}
}
]
}
My question is, is this a bug or a feature?
Is it a feature, meaning was there some rationale behind the client preventing [bulk] updates such as the order in which updates to the same document would be applied could not be guaranteed [to be the same as the order in which they were published]? Or is it a bug, possibly due to old code that made sense based on the supported features of previous versions?
I'd like to be able to take advantage of the Bulk API's ability to do upserts via the 'index' action using the beat.Client, but would like to understand if it was prevented for a still-valid reason prior to submitting an Issue against elastic/beats.
Thanks.