If you’ve tried setting up a hybrid search experience, then you know there can be a steep learning curve. This is especially true when combining text search with semantic search results, because the scores returned from each search can be vastly different. It can be difficult to know where to start, and manually tuning results can be error prone. The good news is that we’re making this process significantly easier.
This post will focus on a lot of defaults, but it’s important to note that we’ve built a lot of customization in as well, so we support simple happy path use cases as well as powerful customizable and expert use cases.
Indexing your data
First, let’s start with the semantic_text field. Semantic text is designed to make it seem like you’re working with a regular text field under the hood. It will take care of chunking, inference, and semantic search transparently. At the minimum, all you need is a mapping for your field:
PUT my-index
{
"mappings": {
"properties": {
"semantic_title": {
"type": "semantic_text"
}
}
}
}
This mapping will create defaults, such as defaulting to using our in-house ELSER model for inference. No indexing pipeline configuration required! If you’re interested in trying out semantic_text for the first time, this tutorial is a great place to start.
To ensure that semantic_text works like a regular text field, we’ve added support for using the match query on semantic_text fields, making it transparent that you’re actually performing a semantic search under the hood:
POST my-index/_search
{
"query": {
"match": {
"semantic_title": "What is semantic search?"
}
}
}
Hybrid search using retrievers
We then built on the fact that a lot of search use cases are well-served by match queries and their associated weights. While retrievers have been around for a while now offering capabilities like RRF, we simplified this syntax to work seamlessly with any combination of text and semantic_text fields, and perform that score combination and normalization seamlessly. You can read this blog for all about how this syntactic sugar came to be. Here’s an example of how to do hybrid search using RRF:
GET my-index/_search
{
"retriever": {
"rrf": {
"fields": ["title", "semantic_title"],
"query": "What is semantic search?"
}
}
}
And a similar example using the linear retriever:
GET my-index/_search{
"retriever": {
"linear": {
"fields": ["title", "semantic_title^2"],
"query": "What is semantic search?",
“normalizer”: “l2_norm”
}
}
}
These can be plugged into retrievers like the text_similarity_reranker retriever, so you can use semantic reranking over your hybrid search results. Pretty simple, right?
Hybrid search using ES|QL
With that said some of the most exciting features we’re working on are using our new piped query language ES|QL, which is extremely powerful for hybrid search use cases. If you’ve attended one of our Elastic{ON} events this year, this preview may look a little familiar to you.
Let’s start with the match function. We can represent this by directly referencing match:
FROM my-index METADATA _score
| WHERE match(title, "What is semantic search?")
| SORT _score DESC
or through a shorthand syntax:
FROM my-index METADATA _score
| WHERE title:"What is semantic search?"
| SORT _score DESC
The best part about this, is that because we’re using match, just like in the DSL semantic_text is supported out of the box!
Now, when you want to combine results using semantic_text you use the FORK command to execute each match query in a different fork and return them in the same result:
FROM my-index METADATA _score
| FORK
(WHERE semantic_title: "What is semantic search?" | SORT _score DESC)
(WHERE title:"What is semantic search?" | SORT _score DESC)
You can then use the FUSE command to combine these results using the RRF algorithm:
FROM my-index METADATA _score
| FORK
(WHERE semantic_title: "What is semantic search?" | SORT _score DESC)
(WHERE title:"What is semantic search?" | SORT _score DESC)
| FUSE
| SORT _score DESC
You could also plug this into additional use cases like rerank and completion.
Finally, sometimes your use cases may make more sense to only perform a lexical search for example, vs. a semantic search. We can also use fork to determine at query time whether to perform a lexical or semantic search - for example based on the number of terms in the query, shorter queries may not benefit as much from semantic search for all use cases.
FROM search-index METADATA _score
| EVAL lexical = MV_COUNT(SPLIT(?query, " ")) <= 3
| FORK ( WHERE lexical | WHERE title:?query )
( WHERE NOT lexical | WHERE semantic_title:?query )
| SORT _score DESC
This is just a brief example of what we can do, but if you dig into our docs you’ll also see how powerful and customizable all of these are as well! Try them out and we look forward to hearing your feedback!
