Hi,
you can combine Terms Aggregation and Top Hits Aggregation. The first let's you group on one or more fields, the later lets you retrieve the most relevant document being aggregated.
Assuming you have documents like you describe, this would rougly look like the following:
GET /index/doc/_search
{
"aggs": {
"agg1": {
"terms": {
"field": "title"
},
"aggs": {
"agg2": {
"terms": {
"field": "type"
},
"aggs": {
"top_docs": {
"top_hits": {
"sort": [
{
"subject": {
"order": "asc"
}
}
],
"_source": {
"include": [
"title", "type", "subject"
]
},
"size" : 1
}
}
}
}
}
}
}
, "size": 0
}
And a result would look something like:
"aggregations": {
"agg1": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "a",
"doc_count": 3,
"agg2": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "b",
"doc_count": 2,
"top_docs": {
"hits": {
"total": 2,
"max_score": null,
"hits": [
{
"_index": "index",
"_type": "doc",
"_id": "AVNby0Besrt2YTZcBcvs",
"_score": null,
"_source": {
"subject": "one",
"title": "a",
"type": "b"
},
"sort": [
"one"
]
}
]
}
}
},
{
"key": "c",
"doc_count": 1,
"top_docs": {
"hits": {
"total": 1,
"max_score": null,
"hits": [
{
"_index": "index",
"_type": "doc",
"_id": "AVNby96Usrt2YTZcBcvu",
"_score": null,
"_source": {
"subject": "three",
"title": "a",
"type": "c"
},
"sort": [
"three"
]
}
]
}
}
}
]
}
}
]
}
}
Note that in this case we are only getting the topmost document, but you could also get more. The order is determined by the main query (omitted here) or in this case by ascending sort
on the subject
field, but there's many ways to do this.
Hope this helps.