Sorting collapsed results based on nested documents

I have a query which makes use of collapse to de-duplicate results. I'm expecting some collapsed results to have multiple duplicates, and some to be unique. I would like to sort the entries in such a way that the results with the most duplicates come first.

I was expecting something like a score_mode parameter similar to nested queries, which would cause the nested documents to affect the score of the collapsed bucket - but that doesn't exist at the moment.

Simplified Example

Suppose I have two indexes - one containing song metadata, and one containing song lyrics. I want to query over both at the same time to find songs based on fields on both documents.

The query might look like this:

{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "artist": "johnny"
          }
        },
        {
          "terms": {
            "lyrics": [
              "the",
              "cat"
            ]
          }
        }
      ]
    }
  },
  "collapse": {
    "field": "song"
  }
}

I would want to sort results which hit both should clauses to the top of the results, but I can't see a way to do this.

I think the standard advice would be to merge the documents, but unfortunately in my actual use-case the song-lyrics index is populated by a task which takes a long time to run. In the meantime I want the metadata to be available to query, and updating a single document once the result is in causes major issues with the cluster performance.

Does anyone know of a way to accomplish this without merging the documents together?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.