Is there some way to get stats aggergation on nested fields such that I only consider maximum of specific values of nested fields for the stats evaluation.
Mapping:
{
"mappings": {
"doc": {
"properties": {
"student_id": {
"type": "long"
},
"test_scores": {
"type": "nested",
"properties": {
"test_id": {
"type": "long"
},
"score": {
"type": "double"
}
}
}
}
}
}
}
Sample Data:
{
"student_id": 1,
"test_scores": [
{
"test_id": 101,
"score": 90
},
{
"test_id": 102,
"score": 70
},
{
"test_id": 103,
"score": 80
}
]
}
{
"student_id": 2,
"test_scores": [
{
"test_id": 101,
"score": 80
},
{
"test_id": 102,
"score": 90
},
{
"test_id": 103,
"score": 85
}
]
}
{
"student_id": 3,
"test_scores": [
{
"test_id": 101,
"score": 30
},
{
"test_id": 102,
"score": 40
},
{
"test_id": 103,
"score": 55
}
]
}
Filtering Query:
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"term": {
"student_id": 1
}
},
{
"nested": {
"path": "test_scores",
"query": {
"terms": {
"test_scores.test_id": [101]
}
}
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"student_id": 2
}
},
{
"nested": {
"path": "test_scores",
"query": {
"terms": {
"test_scores.test_id": [101, 103]
}
}
}
}
]
}
}
]
}
}
}
Requirement:
I need to find min and max (stats aggregation) on test_scores.score for students based on the aboe filtering query such that I only consider maximum test_scores.score per student_id.
Example:
From the filtered documents from above query,
doc:
student_id: 1
test_scores.test_id: 101
test_scores.score: 90
test_scores.score (To be considered for aggregation): 90
doc:
student_id: 2
test_scores.test_id: 101, 103
test_scores.score: 80, 85
test_scores.score (To be considered for aggregation): 85
Expected overall stats on test_scores.score:
max: 90
min: 85
Findings
After searching on web, I found a solution:
{
"aggs": {
"score_stats": {
"stats": {
"script": "if(doc[\"student_id\"].value == 1){
return params._source[\"test_scores\"]
.stream()
.filter(nested -> nested.test_id == 101)
.mapToDouble(nested -> nested.score)
.max()
.orElse(0)
} else if(doc[\"student_id\"].value == 2){
return params._source[\"test_scores\"]
.stream()
.filter(nested ->
nested.test_id == 101 || nested.test_id == 103)
.mapToDouble(nested -> nested.score)
.max()
.orElse(0)
} else {
return 0
}"
}
}
},
"query": {
//filtering query copied here
}
}
}
Response:
"aggregations" : {
"score_stats" : {
"count" : 2,
"min" : 85.0,
"max" : 90.0,
"avg" : 87.5,
"sum" : 175.0
}
}
Problem:
While this solution works for above simple query. My real queries can be quite complex. This approach is not scalable as there is an upper limit on script length.
I tried testing around nested aggregations with filtering aggregation but it seems that after going inside nested path, we can't perform AND/OR with nonNested fields.
Is there some better way to get stats aggergation on nested fields such that I only consider maximum of specific values of nested fields for the stats evaluation.