You would an aggregation instead of a query for this. You could for example use a Terms aggregation.
Given these docs:
PUT my_index/doc/_bulk
{ "index" : { "_id" : "1" } }
{"name": "bob", "age": 30}
{ "index" : { "_id" : "2" } }
{"name": "bob", "age": 20}
{ "index" : { "_id" : "3" } }
{"name": "bob", "age": 30}
{ "index" : { "_id" : "4" } }
{"name": "alice", "age": 20}
{ "index" : { "_id" : "5" } }
{"name": "alice", "age": 30}
You can get the IDs by name using this aggregation:
GET my_index/_search
{
"size": 0,
"aggs": {
"top_names": {
"terms": {
"field": "name.keyword",
"size": 100
},
"aggs": {
"top_ids": {
"terms": {
"field": "_id",
"size": 100
}
}
}
}
}
}
Which will return you:
"buckets": [
{
"key": "bob",
"doc_count": 3,
"top_ids": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1",
"doc_count": 1
},
{
"key": "2",
"doc_count": 1
},
{
"key": "3",
"doc_count": 1
}
]
}
},
{
"key": "alice",
"doc_count": 2,
"top_ids": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "4",
"doc_count": 1
},
{
"key": "5",
"doc_count": 1
}
]
}
}
]
Aggregating on _id
is not possible on older versions of Elasticsearch. You may need to replace _id
with _uid
(which is a concatenation of the _type
and _id
) if you're using an older version.
To aggregate on age you would use replace name.keyword
with age
in the request above.
Note the "size": 100
in the request above. This will limit the response to contain the 100 most common names and will return you up to 100 IDs. You could increase that number if you need to retrieve more values (but you may run into memory limitations). Or alternatively, if you're on version 6.1 or later, you could also take a look at the composite aggregation to retrieve all values and IDs.