Sorting collapsed results based on nested documents

tombrisland · November 8, 2022, 10:20pm

I have a query which makes use of collapse to de-duplicate results. I'm expecting some collapsed results to have multiple duplicates, and some to be unique. I would like to sort the entries in such a way that the results with the most duplicates come first.

I was expecting something like a score_mode parameter similar to nested queries, which would cause the nested documents to affect the score of the collapsed bucket - but that doesn't exist at the moment.

Simplified Example

Suppose I have two indexes - one containing song metadata, and one containing song lyrics. I want to query over both at the same time to find songs based on fields on both documents.

The query might look like this:

{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "artist": "johnny"
          }
        },
        {
          "terms": {
            "lyrics": [
              "the",
              "cat"
            ]
          }
        }
      ]
    }
  },
  "collapse": {
    "field": "song"
  }
}

I would want to sort results which hit both should clauses to the top of the results, but I can't see a way to do this.

I think the standard advice would be to merge the documents, but unfortunately in my actual use-case the song-lyrics index is populated by a task which takes a long time to run. In the meantime I want the metadata to be available to query, and updating a single document once the result is in causes major issues with the cluster performance.

Does anyone know of a way to accomplish this without merging the documents together?

system · December 6, 2022, 10:20pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Collapsed results participating in sorting Elasticsearch	1	438	August 10, 2021
How to sort results based on number of collapsed items? Elasticsearch	3	490	April 14, 2019
Sorting parent documents based on the score of a nested child Elasticsearch	4	1218	August 13, 2018
Sort on collapsed best inner_hit document field value Elasticsearch	2	420	October 11, 2019
Field collapse - can't sort inner hits to take latest document? Elasticsearch	6	2802	February 27, 2018

Sorting collapsed results based on nested documents

Related topics