Return only highest scoring document from a family of documents

Jon_Hourany · October 9, 2018, 2:10am

Let's say I have documents mapped such that:

PUT test_documents
{
  "mappings": {
    "doc": {
      "properties": {
        "parent_id": { "type": "keyword" },
        "body": { "type": "text" }
      }
    }
  }
}

Where body is some body of text and parent_id is the id of the parent document where that body of text came from

PUT test_documents/doc/1
{
  "parent_id": "ZOO BOOK",
  "body": "Zoo's are places where you can see animals"
}

PUT test_documents/doc/2
{
  "parent_id": "ZOO BOOK",
  "body": "Zoo's have lots of animals"
}

PUT test_documents/doc/3
{
  "parent_id": "VET BOOK",
  "body": "Vet's are doctors for animals"
}

When I do a search on this text for both "zoo's" and "animals" I'll get all three documents back as expected

GET test_documents/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "body": "zoo's animals"
          }
        }
      ]
    }
  }
}

but what I'd like is for the return to only have the highest scoring member from each document that shares a parent_id so that in this case, the return would only have 2 documents: the highest scoring member from "ZOO BOOK" and the highest scoring from "VET BOOK" in order of relevance so that if the order of relevance was "ZOO BOOK", "VET BOOK", "ZOO BOOK" this distinct list would just be "ZOO BOOK", "VET BOOK".

I tried doing aggregation on the parent_id field but that didn't really do what I wanted.

abdon · October 10, 2018, 2:38pm

Take a look at the field collapsing feature. It allows you to return the highest scoring document for unique values of a specific field.

To get to what you want to do, your request would look something like this:

GET test_documents/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "body": "zoo's animals"
          }
        }
      ]
    }
  },
  "collapse": {
    "field": "parent_id"
  }
}

system · November 7, 2018, 2:47pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Nested query max value Elasticsearch	2	467	March 10, 2021
Sorting by _score on an aggregation Elasticsearch	4	384	May 5, 2020
Is it even possible? Elasticsearch	3	660	December 11, 2016
Scoring a parent document search by a count of children matching part of the query? Elasticsearch	3	521	July 6, 2017
Score of matched (nested) documents? Elasticsearch	7	941	July 6, 2017

Return only highest scoring document from a family of documents

Related topics