Conditional update based on priority in Elasticsearch

Ahmed_Sharara · January 24, 2021, 3:07pm

Hello,

I'm using logstash to index data from Kafka to Elasticsearch. There are multiple events with the same id. I'm using upsert in logstash to update existing documents with the new values based on the _id.

sample input (in order):

{"id": 5, "status": "pending", "customerID": 7}
{"id": 5, "status": "processed", "processordID": 12}

doc in elasticsearch:
{"id": 5, "status": "processed", "customerID": 7, "processordID": 12}

We have some anomalies where the most recent status would have a lower priority than the original doc.

example (in order):

{"id": 5, "status": "processed"}
{"id": 5, "status": "pending"}

Which leaves the document with the lower priority status "pending".

What I need is some sort of preprocessing in elasticsearch that prevents the update of the document (noop) in case of a lower priority status tries to overwrite a higher priority one.

Is there someway to do preprocessing of data in elasticsearch?

Logstash configuration:

input {
 kafka {
		bootstrap_servers => "kafka:9092"
    topics => "events"
    group_id => "events"
    client_id => "events"
    decorate_events => true
  }
}
... some filters
output {
	elasticsearch {
		index => "events"
		hosts => "http://127.0.0.1:9200"
		document_id => "%{id}"
		action => "update"
		doc_as_upsert => true
	}
}

warkolm · January 24, 2021, 11:40pm

You could set explicit versions in your documents matching the priority you define - https://www.elastic.co/guide/en/elasticsearch/reference/7.10/version.html.

That way the higher priority won't be overwritten with a lower one.

Ahmed_Sharara · January 25, 2021, 1:00am

I don't have this datatype available in my installation. I am using the OSS version of elasticsearch.

Is there any other way to achieve a similar behavior without using the version datatype?

warkolm · January 25, 2021, 1:00am

You do, it's a core part of Elasticsearch.

Ahmed_Sharara · January 25, 2021, 1:04am

When I tried the exact example mapping in the documentation I got:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "mapper_parsing_exception",
        "reason" : "No handler for type [version] declared on field [my_version]"
      }
    ],
    "type" : "mapper_parsing_exception",
    "reason" : "Failed to parse mapping [_doc]: No handler for type [version] declared on field [my_version]",
    "caused_by" : {
      "type" : "mapper_parsing_exception",
      "reason" : "No handler for type [version] declared on field [my_version]"
    }
  },
  "status" : 400
}

I also found it part of the Basic subscription here:
Elastic Subscriptions

So, I deduced that I don't have it.

warkolm · January 25, 2021, 1:10am

Apologies, I linked to the wrong doc page.
See https://www.elastic.co/guide/en/elasticsearch/reference/7.10/docs-index_.html and the reference to version there.

Ahmed_Sharara · January 25, 2021, 3:06am

Thanks for the suggestion.

I've tried using the versioning. Unfortunately, it doesn't work with the update API using Logstash.

I have to use the update API since I have incremental data.
e.g.

{"id": 5, "status": "pending", "customerID": 7}
{"id": 5, "status": "processed", "processordID": 12}

doc in elasticsearch:
{"id": 5, "status": "processed", "customerID": 7, "processordID": 12}

If I use the index API, I'll have a missing field:
{"id": 5, "status": "processed", "processordID": 12}

Logstash Error:

[2021-01-25T02:57:53,886][ERROR][logstash.outputs.elasticsearch][main][375cd2bcc47962e4aa3a8e9f0373d50720bbe90b85cd9126eee85cd4adaa71ed] Encountered a retryable error. Will Retry with exponential backoff  {:code=>400, :url=>"http://127.0.0.1:9200/_bulk", :body=>"{\"error\":{\"root_cause\":[{\"type\":\"illegal_argument_exception\",\"reason\":\"Update requests do not support versioning. Please use `if_seq_no` and `if_primary_term` instead\"}],\"type\":\"illegal_argument_exception\",\"reason\":\"Update requests do not support versioning. Please use `if_seq_no` and `if_primary_term` instead\"},\"status\":400}"}

warkolm · January 25, 2021, 4:13am

Did you try that?

Ahmed_Sharara · January 25, 2021, 4:15am

I can't see any settings for these params in the Logstash Output Plugin:

system · February 22, 2021, 4:16am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Doc_as_upsert along with script in Elasticsearch output plugin Logstash	1	1210	February 22, 2021
Logstash update if id matches Logstash	6	490	July 17, 2019
ConditionalUpdate Logstash	2	452	July 6, 2017
Conditional updating of existing record Logstash	5	520	December 25, 2018
Conditional Upserts? Has this been improved yet? Elasticsearch	1	686	July 25, 2017

Conditional update based on priority in Elasticsearch

Related topics