ElasticSearch Transform Find Time Difference across documents

I have multiple logs being sent out to Elasticsearch in the following way:

    {
        "resourceId" : 12345,
        "event" : "SUBMITTED/PROCESSING/SUCCESSFUL"
        "event_time" : "dateString"
    }

Resource Id will be same across logs of the same resource.
What I want to calculate for my metrics is time taken between SUBMITTED -> SUCCESSFUL steps.

I am using Elastic Transform (since the calculation will be across different documents) for this use-case along with a scripted metric to calculate the time difference,
I am able to group the resources based on resourceId and do some calculations, but the data I am getting is incorrect, looks like I am not able to handle the data if for same resource the data is stored across different shards

my aggregation looks like below

 {
	"aggs": {
	"duration": {
	  "scripted_metric": {
	    "init_script": " state.start = 0; state.end = 0; state.duration = 0",
	    "map_script": "if (doc['event.keyword'].value.equals(\"SUBMITTED\")) {state.start = doc.event_time.value.toInstant().toEpochMilli()} else if (doc['event.keyword'].value.equals(\"SUCCESS\")) { state.end = doc['event_time'].value.toInstant().toEpochMilli()} ",
	    "combine_script": "if (state.start != 0 && state.end!= 0) {(state.duration = state.end - state.start); return state.duration;} else { state.duration = -1; return state.duration; }",
	    "reduce_script": "double b = 0; for (a in states) { if (a != null) { b = a }} return b"
	  		}
		}
	}
}

can someone help me on how this can be achieved?
Thanks

PS: I don't use Logstash, hence can't use elapsed() filter , ideally don't want to use any filters

The combiner runs per shard, only the reducer runs with all shard results.

I suggest to return state in combine_script and calculate duration as part of reduce_script to cover all states from all shards.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.