Bucket Selector Aggregation Script- access doc index field value

Hemrajsinh_Gharia · November 25, 2015, 7:03am

Hello,

I am trying to provide a solution for this stackoverflow question.

`PUT http://localhost:9200/test_index/test/_mapping/`


{

"test": {
    "properties": {
        "date": {
            "type": "date",
            "format": "strict_date_optional_time||epoch_millis"
        },
        "status": {
            "type": "string",
            "index": "not_analyzed"
        },
        "version": {
            "type": "long"
        },
        "workFlowId": {
            "type": "long"
        }
    }
}	}

Then index all the data shown in question (total 10) one by one.

POST http://localhost:9200/test_index/test/1

{
"date" : "2015-11-01",
"workFlowId" : 1,
"version" : 1,
"status": "In Progress"
}

I tried Bucket Script Aggregation and Sub Aggregation as follow:

POST http://localhost:9200/test_index/test/_search?search_type=count

{

"aggs": {
    "per_day": {
        "date_histogram": {
            "field": "date",
            "interval": "day"
        },
        "aggs": {
            "per_status": {
                "terms": {
                    "field": "status"
                },
                "aggs": {
                    "max_version_per_workflow": {
                        "terms": {
                            "field": "workFlowId"
                        },
                        "aggs": {
                            "max_version": {
                                "max": {
                                    "field": "version"
                                }
                            },
                            "eod_bucket_filter": {
                                "bucket_selector": {
                                    "buckets_path": {
                                        "maxVersionPerWorkFlow": "max_version"
                                    },
                                    "script": "2 >= maxVersionPerWorkFlow"
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

}

Which is working fine I guess and giving expected results. For each day I need to find out total "In Progress" and "Completed" workflows considering only records that has largest version till that day. . Keeping that in mind, I am using bucket filter. But as you see, in script I used static value 2. Instead of that I need to use document's version value for comparison. Here doc['title'].value is not working. Any suggestion how I can achieve this?

Christian_Dahlqvist · November 25, 2015, 9:27am

Even though it most likely is possible to do this at query time in Elasticsearch, it may not scale very well as most, if not all, records need to be considered in the query. If you know some queries you want to run, it can sometimes be very beneficial to shift some of the work to index time instead and index the data in multiple ways to efficiently support different type of queries.

In this case you could create a separate workflow-centric index which holds the latest state for each workflow. If you want the latest state for each workflow irrespective of time this would be a single index, which you could use a time-based daily index if you want to get the correct status per day as in this case.

Whenever you receive a new status, you index the raw record into the current index. This allows you to track the progress and analyse the task transitions as you currently do. In this index you can typically let Elasticsearch assign the document ID as no updates will take place. In addition to this you also index the record into a workflow-centric index with a unique identifier, e.g. workflow id, as the document id. If several updates come in for the same workflow, each will result in an update and the latest state will be preserved. Running aggregations across this index to find the current or latest state will be considerably more efficient and scalable as you only have a single record per workflow and do not need to filter out documents based on relationships to other documents.

Please have a look at this presentation around entity-centric indexing, which explains this further and gives a few other examples.

Hemrajsinh_Gharia · November 25, 2015, 11:07am

Thanks Christian. Following your suggestion, I tried to provide the solution. Check the same on this. Correct me if I am wrong.

Topic		Replies	Views
Total number of buckets after bucket selector aggregation filter Elasticsearch	2	1352	December 5, 2017
Elasticsearch sub-aggregation on bucket script - Categorize bucket script result Elasticsearch	1	477	February 5, 2018
Use script to select from non-numeric buckets -- possible? Elasticsearch	4	758	September 20, 2019
Aggregation Path Elasticsearch	1	155	September 17, 2023
Bucket Selector Aggregation on Date Histogram _key Elasticsearch	5	2222	May 1, 2017

Bucket Selector Aggregation Script- access doc index field value

Related topics