Denormalized relationship aggregations

According to the definitive guide (https://www.elastic.co/guide/en/elasticsearch/guide/current/denormalization.html) denormalization is described as the most performant approach to modeling relationships in elasticsearch. I've messed around with denormalization a bit, with field collapsing via terms aggregation with top_hits. I've read that using this approach to group results breaks paging, but that may be fixed by using field_collapsing in request body search (https://www.elastic.co/guide/en/elasticsearch/reference/5.3/search-request-collapse.html). If I add additional bucket aggregations for faceted search purposes, they're at the child level (the many side of the one to many relationship), while I need to have the aggregations at the grouped level (the one side of the one to many relationship). Is it possible to have bucket aggregations where the counts are based on the grouping resulting from field collapsing?

Here's a contrived example:

curl -XPUT http://localhost:9200/cars?pretty -d '
{
	"mappings": {
		"person_car": {
			"properties": {
				"owner": {
					"type": "keyword"
				},
				"make": {
					"type": "keyword"
				},
				"model": {
					"type": "keyword"
				},
				"year": {
					"type": "long"
				}
			}
		}
	}
}
'

curl -XPUT http://localhost:9200/cars/person_car/1?pretty -d'
{
	"owner": "Owner1",
	"make": "Toyota",
	"model": "Tundra",
	"year": 2016
}
'

curl -XPUT http://localhost:9200/cars/person_car/2?pretty -d'
{
	"owner": "Owner2",
	"make": "Toyota",
	"model": "Camry",
	"year": 2015
}
'

curl -XPUT http://localhost:9200/cars/person_car/3?pretty -d'
{
	"owner": "Owner2",
	"make": "Toyota",
	"model": "MR2",
	"year": 1993
}
'

curl http://localhost:9200/cars/person_car/_search?pretty -H'Content-Type: application/json' -d '
{
    "query": {
        "match_all": {}
    },
    "aggs": {
		"makes": {
			"terms": {
				"field": "make"
			}
		}
    },
    "collapse": {
    	"field": "owner",
    	"inner_hits": {
        	"name": "cars", 
        	"size": 5
        }
    }
}
'

the aggregation portion of the query results:

"aggregations" : {
    "makes" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Toyota",
          "doc_count" : 3
        }
      ]
    }
  }

It's counting 3 Toyota records at the person_car level, but I need to get doc_count: 2 for Toyota, since there are 2 owners with Toyotas.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.