Aggregation on specific values of nested fields per document


(Jai Prakash) #1

Is there some way to get stats aggergation on nested fields such that I only consider maximum of specific values of nested fields for the stats evaluation.

Mapping:

{
    "mappings": {
        "doc": {
            "properties": {
                "student_id": {
                    "type": "long"
                },
                "test_scores": {
                    "type": "nested",
                    "properties": {
                        "test_id": {
                            "type": "long"
                        },
                        "score": {
                            "type": "double"
                        }
                    }
                } 
            }
        }
    }
}

Sample Data:

{
  "student_id": 1,
  "test_scores": [
    {
      "test_id": 101,
      "score": 90
    },
    {
      "test_id": 102,
      "score": 70
    },
    {
      "test_id": 103,
      "score": 80
    }
  ]
}

{
  "student_id": 2,
  "test_scores": [
    {
      "test_id": 101,
      "score": 80
    },
    {
      "test_id": 102,
      "score": 90
    },
    {
      "test_id": 103,
      "score": 85
    }
  ]
}

{
  "student_id": 3,
  "test_scores": [
    {
      "test_id": 101,
      "score": 30
    },
    {
      "test_id": 102,
      "score": 40
    },
    {
      "test_id": 103,
      "score": 55
    }
  ]
}

Filtering Query:

{
  "query": {
    "bool": {
      "should": [
        {
          "bool": {
            "must": [
              {
                "term": {
                  "student_id": 1
                }
              },
              {
                "nested": {
                  "path": "test_scores",
                  "query": {
                    "terms": {
                      "test_scores.test_id": [101] 
                    }
                  }
                }
              }
            ]
          }
        },
        {
          "bool": {
            "must": [
              {
                "term": {
                  "student_id": 2
                }
              },
              {
                "nested": {
                  "path": "test_scores",
                  "query": {
                    "terms": {
                      "test_scores.test_id": [101, 103] 
                    }
                  }
                }
              }
            ]
          }
        }
      ]
    }
  }
}

Requirement:

I need to find min and max (stats aggregation) on test_scores.score for students based on the aboe filtering query such that I only consider maximum test_scores.score per student_id.

Example:

From the filtered documents from above query,

doc: 
  student_id: 1
  test_scores.test_id: 101
  test_scores.score: 90
  test_scores.score (To be considered for aggregation): 90

doc:
  student_id: 2
  test_scores.test_id: 101, 103
  test_scores.score:    80, 85
  test_scores.score (To be considered for aggregation): 85

Expected overall stats on test_scores.score:
max: 90
min: 85

Findings

After searching on web, I found a solution:

{
  "aggs": {
    "score_stats": { 
      "stats": {
        "script": "if(doc[\"student_id\"].value == 1){                      
                    return params._source[\"test_scores\"]                  
                        .stream()                                           
                        .filter(nested -> nested.test_id == 101)            
                        .mapToDouble(nested -> nested.score)                
                        .max()                                              
                        .orElse(0)                                          
                  } else if(doc[\"student_id\"].value == 2){                
                    return params._source[\"test_scores\"]                  
                        .stream()                                           
                        .filter(nested ->                                   
                            nested.test_id == 101 || nested.test_id == 103) 
                        .mapToDouble(nested -> nested.score)                
                        .max()                                              
                        .orElse(0)                                          
                  } else {                                                  
                    return 0                                                
                  }"          
      }
    }
  },
  "query": {
        //filtering query copied here
    }
  }
}

Response:

"aggregations" : {
  "score_stats" : {
    "count" : 2,
    "min" : 85.0,
    "max" : 90.0,
    "avg" : 87.5,
    "sum" : 175.0
  }
}

Problem:

While this solution works for above simple query. My real queries can be quite complex. This approach is not scalable as there is an upper limit on script length.

I tried testing around nested aggregations with filtering aggregation but it seems that after going inside nested path, we can't perform AND/OR with nonNested fields.

Is there some better way to get stats aggergation on nested fields such that I only consider maximum of specific values of nested fields for the stats evaluation.


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.