Group documents by similarity using Elser

Hi Anton,

Thanks for using ELSER.

Have you tried the combination of histogram and top hits aggregations on the text_expansion search results? Something like this:

GET my-index/_search
{
  "query": {
    "text_expansion": {
      "ml.tokens": {
        "model_id": ".elser_model_1",
        "model_text": "How to avoid muscle soreness after running?"
      }
    }
  },
  "aggs": {
    "histogram_by_score": {
      "histogram": {
        "script": "_score",
        "interval": 5,
        "min_doc_count": 1
      },
      "aggs": {
        "top_7_documents": {
          "top_hits": {
            "_source": [
              "name",
              "price",
              "nameTokens"
            ],
            "size": 7
          }
        }
      }
    }
  }
}

It should return grouped documents (top 7) with fixed interval (5) by _score.

You also can use range aggregation plus top hits aggregation if you dont want to use fixed interval.

1 Like