Let's say I have documents mapped such that:
PUT test_documents
{
"mappings": {
"doc": {
"properties": {
"parent_id": { "type": "keyword" },
"body": { "type": "text" }
}
}
}
}
Where body
is some body of text and parent_id
is the id of the parent document where that body of text came from
PUT test_documents/doc/1
{
"parent_id": "ZOO BOOK",
"body": "Zoo's are places where you can see animals"
}
PUT test_documents/doc/2
{
"parent_id": "ZOO BOOK",
"body": "Zoo's have lots of animals"
}
PUT test_documents/doc/3
{
"parent_id": "VET BOOK",
"body": "Vet's are doctors for animals"
}
When I do a search on this text for both "zoo's" and "animals" I'll get all three documents back as expected
GET test_documents/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"body": "zoo's animals"
}
}
]
}
}
}
but what I'd like is for the return to only have the highest scoring member from each document that shares a parent_id
so that in this case, the return would only have 2 documents: the highest scoring member from "ZOO BOOK" and the highest scoring from "VET BOOK" in order of relevance so that if the order of relevance was "ZOO BOOK", "VET BOOK", "ZOO BOOK" this distinct list would just be "ZOO BOOK", "VET BOOK".
I tried doing aggregation on the parent_id
field but that didn't really do what I wanted.