Hi friends! I am building a news search engine and need a way to construct ES queries that delivers relevant results. Currently, I am using a combination of must
and should
clauses in a bool query for every news topic a user searches. My query looks as follows (written with elasticsearch_dsl
in Python):
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search, MultiSearch, Q
...
self.search = self.search.query(
"bool",
must=[
Q(
"multi_match",
query="Andrew Yang",
fields=["title", "summary"],
type="best_fields",
tie_breaker=0.5
)
],
should=[
Q(
"multi_match",
query="New Hampshire debate",
fields=["title", "summary"],
type="best_fields",
tie_breaker=1.0
),
Q(
"multi_match",
query="election",
fields=["title", "summary"],
type="best_fields",
tie_breaker=1.0
),
...
],
minimum_should_match=self.min_should_match
)
I initially wished that this query will match exactly the query
in must
(in this example, "Andrew Yang"
) and will attempt to match as many documents with queries words in should
as possible. However, it appears to me that Elasticsearch is not doing strict / exact match with search terms in the must
clause. This brings up problems when two entirely different news topics have partially similar keywords (for example, Andrew Yang
and Prince Andrew
) as when user search one of them, the other has the potential to appear as well..
My question: is there a way to do both exact match one or more certain keywords (let's call them group one keywords) and partial match (like multi-match
in the should
clauses) of some other keywords (let's call them group two keywords) in a single query so that results returned will definitely contain every group one keyword while trying to match as many group two keywords as possible? If so, what's the best way to structure such a query (so that it's not terribly inefficient)?
P.S. Both the title
and summary
fields are indexed with type=text
(standard analyzer). Here is the full mapping that I am currently using:
NEWS_INDEX_MAPPING = {
"mappings": {
"properties": {
"title": { "type": "text" },
"source": {"type": "keyword"},
"category": {"type": "text"},
"id": { "type": "keyword" },
"summary": { "type": "text" },
"url": { "type": "keyword" },
"published_date": { "type": "keyword" },
"img_url": { "type": "text" },
"views": {"type": "long"},
"avg_rating": {"type": "float"},
"num_rated": {"type": "long"}
}
}
}
Will appreciate any suggestion / comment! Thanks!