How do I get find which parts of an Elasticsearch query match

amirs5 · January 9, 2023, 6:58pm

I have a list of Key-word-sets, for example:

set_1: ["movie", "cinema", "theater"],
set_2: ["beach", "ocean", "dock"],
set_3: ["office", "downtown", "center"]

and I want to build up a single query that tells us which one of these sets of keywords have any match in the text.

For example, if the text is the following

I went to the movie theater, which was down by the beach

The query should return [set_1, set_2]

and the following text

The office for my workplace is downtown

Should return [set_3]

Is there any way of doing this simply? We ideally want to fit in one query. The idea was rolling something by hand with highlight in the query but that seems like it would be difficult.

RabBit_BR · January 9, 2023, 7:32pm

Hi @amirs5

set_1, set_2 and set_3 are different documents? If yes, I believe that the should clausule can help you.

{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "set_1": {
              "query": "I went to the movie theater, which was down by the beach"
            }
          }
        },
        {
          "match": {
            "set_2": {
              "query": "I went to the movie theater, which was down by the beach"
            }
          }
        },
        {
          "match": {
            "set_3": {
              "query": "I went to the movie theater, which was down by the beach"
            }
          }
        }
      ]
    }
  },
  "highlight": {
    "fields": {
      "set_1": {},
      "set_2": {},
      "set_3": {}
    }
    }
  }

Hits:

"hits": [
      {
        "_index": "bar",
        "_id": "MjP_l4UBQB-6H-4ZoKZr",
        "_score": 0.5753642,
        "_source": {
          "set_1": [
            "movie",
            "cinema",
            "theater"
          ]
        },
        "highlight": {
          "set_1": [
            "<em>movie</em>",
            "<em>theater</em>"
          ]
        }
      },
      {
        "_index": "bar",
        "_id": "MzP_l4UBQB-6H-4ZoKZr",
        "_score": 0.2876821,
        "_source": {
          "set_2": [
            "beach",
            "ocean",
            "dock"
          ]
        },
        "highlight": {
          "set_2": [
            "<em>beach</em>"
          ]
        }
      }
    ]

amirs5 · January 9, 2023, 9:28pm

Hi, Thank you for the response,

I'm looking for something slightly different, set_1 set_2 and set_3 are all part of the query.

"I went to the movie theater, which was down by the beach" is an example of a potential document.

RabBit_BR · January 9, 2023, 11:21pm

What elasticsearch returns are the documents that match the list of terms in the input.

Why do you want to do this?

amirs5 · January 10, 2023, 12:57am

The idea is we have a series of large bodies of text that we index in our Elastic search clusters, for example one would be a news article.

We then want to use the Elasticsearch query to "tag" the article of text.

We build up all the sets of keywords, for example:

set_1 is the "cinema keywords set" which I describe above
set_2 is the "ocean keywords set"
set_3 is the office keywords set.

We may have hundreds of these keywords set.

Then we want to use Elasticsearch to tag a specific article with the keyword sets. For example, a document that mentions cinema and ocean keywords can return the correct tags from the query.

Of course there are ways of doing this on our own, however if we had something like this it would be much easier as we don't want to load in the full body of text, and we also want to avoid rolling our own lemmatization logic, etc.

I hope this makes more sense. Thank you!

RabBit_BR · January 10, 2023, 1:31am

I think I understand. You want to use passing a list to check if the terms exist in the text, if so, these terms become tags that can later be indexed in the document to facilitate the search. Right?

In this case, you pass the list using the Terms Query, where a match happens, you use the highlight to mark the text. With the terms marked, you create the list of tags in the document.

The problem I see is that it is not possible to send the list of sets and make the correspondences you want. The option is to make requests for each set separately.

Wouldn't it be better for you to have a service that infers the tags of the texts instead of trying to do it through Elasticsearch?

amirs5 · January 10, 2023, 2:28am

Thank you! That is the idea we had as well. Going from Highlights -> back to search terms makes sense.

Yes I agree that having a service such as that would be useful however we wanted to keep our documents indexed only in one place (Elasticsearch) if possible.

RabBit_BR · January 10, 2023, 2:37am

Following the logic of the Terms Query, the closest to the result you want would be this for set_1. You would have to manipulate the highlight result.

{
  "query": {
    "terms": {
      "text": [
        "movie",
        "cinema",
        "theater"
      ]
    }
  },
  "highlight": {
    "fields": {
      "text": {}
    }
  }
}

Hits

"hits": [
      {
        "_index": "bar",
        "_id": "SzOHmYUBQB-6H-4ZDqaL",
        "_score": 1,
        "_source": {
          "text": "I went to the movie theater, which was down by the beach"
        },
        "highlight": {
          "text": [
            "I went to the <em>movie</em> <em>theater</em>, which was down by the beach"
          ]
        }
      }
    ]

system · February 7, 2023, 2:38am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsaerch Query Exact match Elasticsearch	7	200	March 23, 2024
Elasticsearch Query: using match query with AND operator, fetching all the data Elasticsearch	3	135	March 19, 2024
Highlight if all word matches Elasticsearch	1	511	September 26, 2017
Subset match query Elasticsearch	4	979	February 19, 2021
How to get result that contains every word in the query Elasticsearch	3	1562	October 21, 2019

How do I get find which parts of an Elasticsearch query match

Related Topics