Group documents by similarity using Elser

wei.wang · September 13, 2023, 4:00pm

Hi Anton,

Thanks for using ELSER.

Have you tried the combination of histogram and top hits aggregations on the text_expansion search results? Something like this:

GET my-index/_search
{
  "query": {
    "text_expansion": {
      "ml.tokens": {
        "model_id": ".elser_model_1",
        "model_text": "How to avoid muscle soreness after running?"
      }
    }
  },
  "aggs": {
    "histogram_by_score": {
      "histogram": {
        "script": "_score",
        "interval": 5,
        "min_doc_count": 1
      },
      "aggs": {
        "top_7_documents": {
          "top_hits": {
            "_source": [
              "name",
              "price",
              "nameTokens"
            ],
            "size": 7
          }
        }
      }
    }
  }
}

It should return grouped documents (top 7) with fixed interval (5) by _score.

You also can use range aggregation plus top hits aggregation if you dont want to use fixed interval.

Topic		Replies	Views
Group Documents by it's similarity Elasticsearch	1	360	August 30, 2019
Retrieving document groups with MoreLikeThis query Elasticsearch	1	528	July 6, 2017
Grouping by similarity Elasticsearch	6	2076	May 20, 2019
How to do MoreLikeTHESE in Elastic Search? Elasticsearch	1	484	July 6, 2017
Buckets of documents grouped by term frequency Elasticsearch	3	620	July 5, 2017

Group documents by similarity using Elser

Related topics