and I want to build up a single query that tells us which one of these sets of keywords have any match in the text.
For example, if the text is the following
I went to the movie theater, which was down by the beach
The query should return [set_1, set_2]
and the following text
The office for my workplace is downtown
Should return [set_3]
Is there any way of doing this simply? We ideally want to fit in one query. The idea was rolling something by hand with highlight in the query but that seems like it would be difficult.
set_1, set_2 and set_3 are different documents? If yes, I believe that the should clausule can help you.
{
"query": {
"bool": {
"should": [
{
"match": {
"set_1": {
"query": "I went to the movie theater, which was down by the beach"
}
}
},
{
"match": {
"set_2": {
"query": "I went to the movie theater, which was down by the beach"
}
}
},
{
"match": {
"set_3": {
"query": "I went to the movie theater, which was down by the beach"
}
}
}
]
}
},
"highlight": {
"fields": {
"set_1": {},
"set_2": {},
"set_3": {}
}
}
}
The idea is we have a series of large bodies of text that we index in our Elastic search clusters, for example one would be a news article.
We then want to use the Elasticsearch query to "tag" the article of text.
We build up all the sets of keywords, for example:
set_1 is the "cinema keywords set" which I describe above
set_2 is the "ocean keywords set"
set_3 is the office keywords set.
We may have hundreds of these keywords set.
Then we want to use Elasticsearch to tag a specific article with the keyword sets. For example, a document that mentions cinema and ocean keywords can return the correct tags from the query.
Of course there are ways of doing this on our own, however if we had something like this it would be much easier as we don't want to load in the full body of text, and we also want to avoid rolling our own lemmatization logic, etc.
I think I understand. You want to use passing a list to check if the terms exist in the text, if so, these terms become tags that can later be indexed in the document to facilitate the search. Right?
In this case, you pass the list using the Terms Query, where a match happens, you use the highlight to mark the text. With the terms marked, you create the list of tags in the document.
The problem I see is that it is not possible to send the list of sets and make the correspondences you want. The option is to make requests for each set separately.
Wouldn't it be better for you to have a service that infers the tags of the texts instead of trying to do it through Elasticsearch?
Thank you! That is the idea we had as well. Going from Highlights -> back to search terms makes sense.
Yes I agree that having a service such as that would be useful however we wanted to keep our documents indexed only in one place (Elasticsearch) if possible.
"hits": [
{
"_index": "bar",
"_id": "SzOHmYUBQB-6H-4ZDqaL",
"_score": 1,
"_source": {
"text": "I went to the movie theater, which was down by the beach"
},
"highlight": {
"text": [
"I went to the <em>movie</em> <em>theater</em>, which was down by the beach"
]
}
}
]
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.