Dec 11th, 2023: [EN] Relevant Search Combining ELSER and BM25 Text Queries

The Elastic Learned Spare EncodeR (ELSER) allows you to perform semantic search for more relevant search results. Sometimes, however, it’s more useful to combine semantic search results with regular keyword search results to get the best results possible. The question is, how to combine both text and semantic search results?

First, let’s look at a garden variety text query, using multi_match over certain fields. This search has the typical pitfalls of keyword search, namely that the keyword has to exist in some form in the document to be returned, and we don’t take the context of what users are searching for into account.

POST search-national-parks/_search
{
  "query": {
    "multi_match": {
      "query": "Where can I see the Northern Lights?",
      "fields": ["title", "description"]
    }
  },
  "_source": ["title"]
}

Now, let’s look at an ELSER query by itself:

POST search-national-parks/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "text_expansion": {
            "ml.inference.title_expanded.predicted_value": {
              "model_id": ".elser_model_2",
              "model_text": "Where can I see the Northern Lights?"
            }
          }
        },
        {
          "text_expansion": {
            "ml.inference.description_expanded.predicted_value": {
              "model_id": ".elser_model_2",
              "model_text": "Where can I see the Northern Lights?"
            }
          }
        }
      ]
    }
  },
  "_source": [
    "title"
  ]
}

The first way to combine these two queries is with a strategy known as linear boosting. In this example, we are boosting the text search results so that they have precedence. This may or may not be desirable based on the query that you’re running.

POST search-national-parks/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "text_expansion": {
            "ml.inference.title_expanded.predicted_value": {
              "model_id": ".elser_model_2",
              "model_text": "Where can I see the Northern Lights?",
              "boost": 1
            }
          }
        },
        {
          "text_expansion": {
            "ml.inference.description_expanded.predicted_value": {
              "model_id": ".elser_model_2",
              "model_text": "Where can I see the Northern Lights?",
              "boost": 1
            }
          }
        },
        {
          "multi_match": {
            "query": "Where can I see the Northern Lights?",
            "fields": [
              "title",
              "description"
            ],
            "boost": 4
          }
        }
      ]
    }
  },
  "_source": [
    "title"
  ]
}

Finally, we can also use Reciprocal Rank Fusion (RRF) to combine text search results with semantic results, and rescore the returned search results:

POST search-national-parks/_search
{
  "sub_searches": [
    {
      "query": {
        "multi_match": {
          "query": "Where can I see the Northern Lights?",
          "fields": [
            "title",
            "description"
          ]
        }
      }
    },
    {
      "query": {
        "text_expansion": {
          "ml.inference.title_expanded.predicted_value": {
            "model_id": ".elser_model_2",
            "model_text": "Where can I see the Northern Lights?"
          }
        }
      }
    },
    {
      "query": {
        "text_expansion": {
          "ml.inference.description_expanded.predicted_value": {
            "model_id": ".elser_model_2",
            "model_text": "Where can I see the Northern Lights?"
          }
        }
      }
    }
  ],
  "rank": {
    "rrf": {
      "window_size": 10,
      "rank_constant": 20
    }
  },
  "_source": [
    "title", "states"
  ]
}

These examples should help get you started on your journey to creating the most relevant search results for your use case!

Want to learn more, or just want to play around? Check out Search Labs for information and tutorials like this Search Tutorial to get started with building search solutions using vector search in Elasticsearch.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.