Collapsed results participating in sorting

I am using Elasticsearch and I want to group our results by a specific field, returning only the most recent document per group. When scoring and sorting, I want the documents I am not returning (the ones that are older) to be ignored.

I have tried approaching this with collapse, however the "hidden" documents are also taken into account, which I would like to avoid.

Example

In the following example I have 2 groups of documents, which I would like to group by their email , taking for each group the most recent by created_at , and sort them by their rating descending.

With the data of the example, the most recent ones are Aaa 1 (with email aaa@aaa.com ) and Bbb 4 (with email bbb@bbb.com ). I want to sort by their rating descending, I am expecting Bbb 4 and then Aaa 1 . However, they are returned the other way around, because the Aaa 2 and Aaa 3 are also scored, which I want to avoid.

How can I write my query in a way that would return Bbb 4 and then Aaa 1 ? Should I be using the top_hits aggregation instead?

PUT test
{
  "mappings": {
    "properties": {
      "name": {
        "type": "keyword"
      },
      "email": {
        "type": "keyword"
      },
      "description": {
        "type": "text"
      },
      "rating": {
        "type": "integer"
      },
      "created_at": {
        "type": "date"
      }
    }
  }
}

POST test/_doc
{
  "name": "Aaa 1",
  "rating": 1,
  "created_at": "2021-01-01",
  "description": "A quick fox",
  "email": "aaa@aaa.com"
}

POST test/_doc
{
  "name": "Aaa 2",
  "rating": 20,
  "created_at": "2020-01-01",
  "description": "jumps over",
  "email": "aaa@aaa.com"
}

POST test/_doc
{
  "name": "Aaa 3",
  "rating": 30,
  "created_at": "2019-01-01",
  "description": "the fence",
  "email": "aaa@aaa.com"
}

POST test/_doc
{
  "name": "Bbb 4",
  "rating": 4,
  "created_at": "2021-01-02",
  "description": "behind the house",
  "email": "bbb@bbb.com"
}

POST test/_doc
{
  "name": "Bbb 5",
  "rating": 5,
  "created_at": "2020-01-02",
  "description": "we live in",
  "email": "bbb@bbb.com"
}

GET test/_search
{
  "_source": false,
  "track_total_hits": false,
  "query": {
    "bool": {
      "should": {
        "match_all": {}
      }
    }
  },
  "collapse": {
    "field": "email",
    "inner_hits": [
      {
        "name": "last_document",
        "size": 1,
        "_source": ["name","email","rating"],
        "sort": [
          {
            "created_at": {
              "order": "desc"
            }
          }
        ]
      }
    ]
  },
  "sort": [
    {
      "rating": {
        "order": "desc"
      }
    }
  ]
}

This returns

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "max_score" : null,
    "hits" : [
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "bccEn3oBRQ1dOOnBe3nD",
        "_score" : null,
        "fields" : {
          "email" : [
            "aaa@aaa.com"
          ]
        },
        "sort" : [
          30
        ],
        "inner_hits" : {
          "last_document" : {
            "hits" : {
              "total" : {
                "value" : 3,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "test",
                  "_type" : "_doc",
                  "_id" : "a8cEn3oBRQ1dOOnBdXli",
                  "_score" : null,
                  "_source" : {
                    "name" : "Aaa 1",
                    "rating" : 1,
                    "email" : "aaa@aaa.com"
                  },
                  "sort" : [
                    1609459200000
                  ]
                }
              ]
            }
          }
        }
      },
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "b8cEn3oBRQ1dOOnBiHkx",
        "_score" : null,
        "fields" : {
          "email" : [
            "bbb@bbb.com"
          ]
        },
        "sort" : [
          5
        ],
        "inner_hits" : {
          "last_document" : {
            "hits" : {
              "total" : {
                "value" : 2,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "test",
                  "_type" : "_doc",
                  "_id" : "bscEn3oBRQ1dOOnBgHlt",
                  "_score" : null,
                  "_source" : {
                    "name" : "Bbb 4",
                    "rating" : 4,
                    "email" : "bbb@bbb.com"
                  },
                  "sort" : [
                    1609545600000
                  ]
                }
              ]
            }
          }
        }
      }
    ]
  }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.