Scripted Upsert with Logstash ES Output Plugin - overwrites array data with latest event instead of update on reconnection

VIVEK_SHARMA3 · May 14, 2020, 4:49am

Scenario: We are adding object to array via logstash to elasticsearch using logstash's ES output plugin. Both are on version 6.8. It is a scripted upsert using stored script on Elasticsearch and here is the stored script on ES:

"add_script" : {
        "lang" : "painless",
        "source" : "if (ctx._source.tags != null ) { ctx._source.tags.add(params.event.get('tags')[0])} else {ctx._source.tags = params.event.get('tags')} "
      }

Logstash output is:

       output {
            elasticsearch {
                    "action" =>"update"
                    "hosts" => "*****"
                    "index" => "%{index}"
                    "document_id" => "%{cid}"
                    "scripted_upsert" => true
                    "upsert" => ""
                    script_lang => ""
                    script_type => "indexed"
                    script => "add_script"
                    timeout => 120

           }
        }

With above two systems we are trying to push tags object into an array if it exists, else create a new tag array if it doesn't. Simple stuff so far!

Issue: Under load testing where ES endpoint become momentarily unavailable (and this issue is for later discussions) or a logstash restart with few events in queue, the moment connection is restored, the script ignores the != null part and directly overwrites the entire array of tags. For instance, if the tags array was initially like this in ES:

    {
        cid: 1,
        tags: [{
                id: 1,
                "tag": "test_tag1"
            },
            {
                id: 2,
                "tag": "test_tag2"
            },
            {
                id: 3,
                "tag": "test_tag3"
            }
        ]
    }

and the last event was that came in for update was

    tags: [{
            id: 4,
            "tag": "test_tag4"
        }]

then instead of pushing this array the end document looks like this:

    {
        cid: 1,
        tags: [{
            id: 4,
            "tag": "test_tag4"
        }]
    }

This happens only when there is connection restore to ES, as mentioned earlier either when ES encounter host unreachable error or logstash restarts while some events are still in queue and they start getting applied as soon it starts.

So far few things which we have played around with no luck are

Reducing bulk size and number of workers in logstash pipeline - this was done keeping in mind that we may be overwhelming ES.
Increase the retry_interval to 30s from 2s, this was done with a theory that when we reconnect to ES and the script executes, there is a window of time when ES return no records for matching documents and thus thinks that document key is empty and goes ahead and runs the else part, overwriting the entire array. So to give ES sometime to warmup, increased this.

Help needed as soon as possible to rectify this situation, as we just stopped a major product feature release due to this issue and we are not able to figure out any way to get out of this. Why is logstash on connection not able to honour the null check if (ctx._source.tags != null ) on reconnection? Can we call this as bug in ElasticSeach?

system · June 11, 2020, 4:49am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Scripted_upsert not working when logtash looses connection to ES and reconnects Logstash painless	2	540	February 7, 2021
Doc_as_upsert along with script in Elasticsearch output plugin Logstash	1	1202	February 22, 2021
Scripted upsert is failing in Elasticsearch output Logstash painless	1	150	January 8, 2024
Elasticsearch output plugin .. with script not working Logstash	1	489	July 3, 2019
Elasticsearch's output script running only for updated documents Logstash	1	1207	February 21, 2019

Scripted Upsert with Logstash ES Output Plugin - overwrites array data with latest event instead of update on reconnection

Related topics