Terms aggregation on a nested field for calculating stats based on a parent-field

Hello,

in my use case i have documents with a client and a price. The client is a nested object (containing id, and acronym), the price is on the first json level of my source). I would like to calculate metric values (i.e. stats metrics) grouped by client.id. The mapping looks as follows:

PUT /test
{
  "mappings": {
	"properties": {
	  "client": {
		"type": "nested",
		"properties": {
		  "id": {
			"type": "long"
		  }
		}
	  },
	  "price": {
		"type": "long"
	  }
	}
  }
}

Then i inserted 3 documents. The field client2 is the same as client, but it is not explicitly defined as nested in the mapping (for learning purposes). :

PUT /test/_doc/1
{
  "client": {
	"id": 1,
	"acronym": "O"
  },
  "client2": {
	"id": 1,
	"acronym": "O"
  },
  "price": 10
}

PUT /test/_doc/2
{
  "client": {
	"id": 2,
	"acronym": "U"
  },
  "client2": {
	"id": 2,
	"acronym": "U"
  },
  "price": 10
}

PUT /test/_doc/3
{
  "client": {
	"id": 2,
	"acronym": "U"
  },
  "client2": {
	"id": 2,
	"acronym": "U"
  },
  "price": 10
}

Now the query for getting the stats per "client2.id" is:

POST /test/_search
{
  "size": 0,
  "aggs": {
	"GROUP_BY_CLIENT_ID": {
	  "terms": {
		"field": "client2.id",
		"min_doc_count": 0
	  },
	  "aggs": {
		"STATS_FOR_CLIENT": {
		  "stats": {
			"field": "price"
		  }
		}
	  }
	}
  }
} 

And i get the result i want:

"aggregations" : {
	"GROUP_BY_CLIENT_ID" : {
	  "doc_count_error_upper_bound" : 0,
	  "sum_other_doc_count" : 0,
	  "buckets" : [
		{
		  "key" : 2,
		  "doc_count" : 2,
		  "STATS_FOR_CLIENT" : {
			"count" : 2,
			"min" : 10.0,
			"max" : 10.0,
			"avg" : 10.0,
			"sum" : 20.0
		  }
		},
		{
		  "key" : 1,
		  "doc_count" : 1,
		  "STATS_FOR_CLIENT" : {
			"count" : 1,
			"min" : 10.0,
			"max" : 10.0,
			"avg" : 10.0,
			"sum" : 10.0
		  }
		}
	  ]
	}
  }

The same query (only replace "client2.id" with "client.id" is not working and gives me 0 elements in both buckets.
Query:

POST /test/_search
{
  "size": 0,
  "aggs": {
    "GROUP_BY_CLIENT_ID": {
      "terms": {
        "field": "client.id",
        "min_doc_count": 0
      },
      "aggs": {
        "STATS_FOR_CLIENT": {
          "stats": {
            "field": "price"
          }
        }
      }
    }
  }
}

Result:

"aggregations" : {
    "GROUP_BY_CLIENT_ID" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : 1,
          "doc_count" : 0,
          "STATS_FOR_CLIENT" : {
            "count" : 0,
            "min" : null,
            "max" : null,
            "avg" : null,
            "sum" : 0.0
          }
        },
        {
          "key" : 2,
          "doc_count" : 0,
          "STATS_FOR_CLIENT" : {
            "count" : 0,
            "min" : null,
            "max" : null,
            "avg" : null,
            "sum" : 0.0
          }
        }
      ]
    }
  }

Using the "client2.id" approach is not the solution, because i need also to make sure to search in a nested way. So removing the definition of "nested" in the mapping for the "client" field is not an option. As "client" is explicitly defined as "nested" in the mapping, i am forced to do a nested aggregation according to the solution of this post . I tried to merge this solution with my issue, but i don't get the result i without stats.
Query:

POST /test/_search
{
  "size": 0,
  "aggs": {
	"GROUP_BY_CLIENT": {
	  "nested": {
		"path": "client"
	  },
	  "aggs": {
		"GROUP_BY_CLIENT_ID": {
		  "terms": {
			"field": "client.id",
			"min_doc_count": 0
		  },
		  "aggs": {
			"STATS_FOR_CLIENT_PRICES": {
			  "stats": {
				"field": "price"
			  }
			}
		  }
		}
	  }
	}
  }
}

In the result i get now the right query counts, but not the right stats. The result looks like:

"aggregations" : {
	"GROUP_BY_CLIENT" : {
	  "doc_count" : 3,
	  "GROUP_BY_CLIENT_ID" : {
		"doc_count_error_upper_bound" : 0,
		"sum_other_doc_count" : 0,
		"buckets" : [
		  {
			"key" : 2,
			"doc_count" : 2,
			"STATS_FOR_CLIENT_PRICES" : {
			  "count" : 0,
			  "min" : null,
			  "max" : null,
			  "avg" : null,
			  "sum" : 0.0
			}
		  },
		  {
			"key" : 1,
			"doc_count" : 1,
			"STATS_FOR_CLIENT_PRICES" : {
			  "count" : 0,
			  "min" : null,
			  "max" : null,
			  "avg" : null,
			  "sum" : 0.0
			}
		  }
		]
	  }
	}
}

I think the problem is the price, which is one level higher than the nested "client" context. What is the query in order to group the stats by "client.id" in this use case? Any ideas or approaches out there?

Thanks a lot for your help and best regards
Andy

I was able to solve it by myself by using reverse nested aggregation. The query looks now like:

POST /test/_search
{
  "size": 0,
  "aggs": {
    "GO_INTO_CLIENT": {
      "nested": {
        "path": "client"
      },
      "aggs": {
        "GROUP_BY_CLIENT_ID": {
          "terms": {
            "field": "client.id",
            "min_doc_count": 0
          },
          "aggs": {
            "GO_FROM_CLIENT_TO_ROOT": {
              "reverse_nested": {},
              "aggs": {
                "PRICE_STATS": {
                  "stats": {
                    "field": "price"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

And this gives me the result i want:

"aggregations" : {
    "GROUP_BY_CLIENT" : {
      "doc_count" : 3,
      "GROUP_BY_CLIENT_ID" : {
        "doc_count_error_upper_bound" : 0,
        "sum_other_doc_count" : 0,
        "buckets" : [
          {
            "key" : 2,
            "doc_count" : 2,
            "GO_FROM_CLIENT_TO_ROOT" : {
              "doc_count" : 2,
              "PRICE_STATS" : {
                "count" : 2,
                "min" : 10.0,
                "max" : 10.0,
                "avg" : 10.0,
                "sum" : 20.0
              }
            }
          },
          {
            "key" : 1,
            "doc_count" : 1,
            "GO_FROM_CLIENT_TO_ROOT" : {
              "doc_count" : 1,
              "PRICE_STATS" : {
                "count" : 1,
                "min" : 10.0,
                "max" : 10.0,
                "avg" : 10.0,
                "sum" : 10.0
              }
            }
          }
        ]
      }
    }
  }

Maybe it will also help somebody else in the future. Thanks again and best regards.

Andy

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.