Using Elasticsearch 6.2.x, I am failing to write a particular query. My data contains information about when an actor commenced one of several consecutive phases:
curl -XPUT http://localhost:9200/phases
curl -XPUT http://localhost:9200/phases/_mapping/_doc -H'Content-Type:application/json' -d'{
"properties": {
"actor": { "type": "keyword" },
"verb": { "type": "keyword" },
"object": { "type": "keyword" }
}
}'
curl -XPOST http://localhost:9200/phases/_doc -H'Content-Type:application/json' -d'{
"actor": "actor1",
"verb": "started",
"object": "phase2",
"timestamp": "2018-06-20T12:00:00.0000Z"
}'
curl -XPOST http://localhost:9200/phases/_doc -H'Content-Type:application/json' -d'{
"actor": "actor1",
"verb": "started",
"object": "phase1",
"timestamp": "2018-06-18T12:00:00.0000Z"
}'
curl -XPOST http://localhost:9200/phases/_doc -H'Content-Type:application/json' -d'{
"actor": "actor2",
"verb": "started",
"object": "phase1",
"timestamp": "2018-06-19T12:00:00.0000Z"
}'
What I'm trying to find out is how many actors are presently in which phase, where only the most recently started phase is considered for each actor (phase1:1, phase2:1 in the example). I know how to query for the phase each actor is presently in by combining a terms with a top_hits query as follows:
curl -XPOST http://localhost:9200/phases/_search -H'Content-Type:application/json' -d'{
"aggs": {
"_xs_valueagg": {
"terms": { "field": "actor" },
"aggs": {
"latest": {
"top_hits": {
"size": 1,
"sort": [
{ "timestamp": { "order": "desc" } }
],
"_source": {
"includes": [ "object" ]
}
}
}
}
}
}
}'
However, this gives results of the general form: actor1:phase2, actor2:phase1, etc. What I'm failing to achieve is to count the number of actors per phase across these buckets. Since this is a post-processing step on existing buckets I have attempted various forms of pipeline aggregations (stats and bucket_script in particular) but fail to (a) create an entirely new bucket structure and (b) count the occurrences of a categorical attribute.
Is there a way to query for the number of actors per phase, where only the most recently entered phase is considered per actor?