Ok, understood - yes, scripted fields manipulate fields within a document. Thanks for providing an example - so if the max delta that you're after is to first find the max delta (per user) then I believe you're going to have to delve into the world of pipeline aggregations.
Specifically, I think you're going to need to pipeline the following:
- a
date_histogram
(link) aggregation to bucket the data in intervals of time
- a
terms
(link) aggregation to separate the data per user
- a
max
(link) aggregation to find the max number of widgets per user (in a bucket)
- a
min
(link) aggregation to find the min number of widgets per user (in a bucket)
- a
bucket_script
(link) aggregation to find the difference between max and min, per user (in a bucket)
An example search aggregation (using a play dataset that can be found here)
POST farequote-*/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggregations": {
"buckets": {
"date_histogram": {
"field": "@timestamp",
"interval": "day",
"time_zone": "UTC"
},
"aggregations": {
"@timestamp": {
"max": {
"field": "@timestamp"
}
},
"airlines": {
"terms": {
"field": "airline",
"size": 200,
"order": {
"_count": "desc"
}
},
"aggregations": {
"max": {
"max": {
"field": "responsetime"
}
},
"min": {
"min": {
"field": "responsetime"
}
},
"max_delta": {
"bucket_script": {
"buckets_path": {
"maxval": "max",
"minval": "min"
},
"script": "params.maxval - params.minval"
}
}
}
}
}
}
}
}
The output looks like:
{
"took" : 61,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 86274,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"buckets" : {
"buckets" : [
{
"key_as_string" : "2017-02-07T00:00:00.000Z",
"key" : 1486425600000,
"doc_count" : 17211,
"@timestamp" : {
"value" : 1.486511998E12,
"value_as_string" : "2017-02-07T23:59:58.000Z"
},
"airlines" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "AWE",
"doc_count" : 1718,
"min" : {
"value" : 16.769500732421875
},
"max" : {
"value" : 23.477800369262695
},
"max_delta" : {
"value" : 6.70829963684082
}
},
{
"key" : "AAL",
"doc_count" : 1715,
"min" : {
"value" : 22.50950050354004
},
"max" : {
"value" : 182.12440490722656
},
"max_delta" : {
"value" : 159.61490440368652
}
},
{
"key" : "UAL",
"doc_count" : 1158,
"min" : {
"value" : 6.731100082397461
},
"max" : {
"value" : 13.200699806213379
},
"max_delta" : {
"value" : 6.469599723815918
}
},
...
Assuming you can do all of the above, you'll then need to adapt the above a little in order to allow your ML job to leverage these aggregations. See (link) and (link)