I have a pivot transform that calculates a duration between two timestamped documents by grouping them, matching like documents, then aggregating the min timestamp, the max timestamp and using a script to calculate the duration between those two:
PUT _transform/transform-name
{
"source": {
"index": "source-index-name"
},
"dest" : {
"index" : "dest-index-name"
},
"pivot": {
"group_by": {
"linkId": { "terms": { "field": "linkId" }},
"string_metadata_hash": { "terms": { "field": "string_metadata_hash" }},
"string_calculation_job_type": {
"terms": {
"script": {
"source": """
def m = /^calculation_job_(.*)_([^_]+)$/.matcher(doc['eventType'].value);
if (m.matches()) {
return m.group(1);
}
"""
}
}
}
},
"aggregations": {
"metadata": {
"scripted_metric": {
"init_script": "state.doc = new HashMap()",
"map_script": "if (state.doc.isEmpty()){state.doc = new HashMap(params['_source'].metadata)}",
"combine_script": "return state.doc",
"reduce_script": "return states[0]"
}
},
"start": {
"min": {
"field": "timestamp"
}
},
"complete": {
"max": {
"field": "timestamp"
}
},
"duration_sec": {
"bucket_script": {
"buckets_path": {
"start": "start.value",
"complete": "complete.value"
},
"script": "return (params.complete - params.start)/1000;"
}
}
}
},
"sync": {
"time": {
"field": "timestamp",
"delay": "60s"
}
}
}
This matches event Type calculation_job_foo_start
with calculation_job_foo_complete
to calculate the duration between the two.
However, it also results in a document when there is only a calculation_job_foo_start
and no calculation_job_foo_complete
- which results in a start
and complete
times that are the same (both start times) and a duration_sec
of 0
Is there a way to prevent output to dest index until there is both a start and complete event type? In other words, filter the output?