How do I get find which parts of an Elasticsearch query match

I have a list of Key-word-sets, for example:

set_1: ["movie", "cinema", "theater"],
set_2: ["beach", "ocean", "dock"],
set_3: ["office", "downtown", "center"]

and I want to build up a single query that tells us which one of these sets of keywords have any match in the text.

For example, if the text is the following

I went to the movie theater, which was down by the beach

The query should return [set_1, set_2]

and the following text

The office for my workplace is downtown

Should return [set_3]

Is there any way of doing this simply? We ideally want to fit in one query. The idea was rolling something by hand with highlight in the query but that seems like it would be difficult.

Hi @amirs5

set_1, set_2 and set_3 are different documents? If yes, I believe that the should clausule can help you.

{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "set_1": {
              "query": "I went to the movie theater, which was down by the beach"
            }
          }
        },
        {
          "match": {
            "set_2": {
              "query": "I went to the movie theater, which was down by the beach"
            }
          }
        },
        {
          "match": {
            "set_3": {
              "query": "I went to the movie theater, which was down by the beach"
            }
          }
        }
      ]
    }
  },
  "highlight": {
    "fields": {
      "set_1": {},
      "set_2": {},
      "set_3": {}
    }
    }
  }

Hits:

"hits": [
      {
        "_index": "bar",
        "_id": "MjP_l4UBQB-6H-4ZoKZr",
        "_score": 0.5753642,
        "_source": {
          "set_1": [
            "movie",
            "cinema",
            "theater"
          ]
        },
        "highlight": {
          "set_1": [
            "<em>movie</em>",
            "<em>theater</em>"
          ]
        }
      },
      {
        "_index": "bar",
        "_id": "MzP_l4UBQB-6H-4ZoKZr",
        "_score": 0.2876821,
        "_source": {
          "set_2": [
            "beach",
            "ocean",
            "dock"
          ]
        },
        "highlight": {
          "set_2": [
            "<em>beach</em>"
          ]
        }
      }
    ]

Hi, Thank you for the response,

I'm looking for something slightly different, set_1 set_2 and set_3 are all part of the query.

"I went to the movie theater, which was down by the beach" is an example of a potential document.

What elasticsearch returns are the documents that match the list of terms in the input.

Why do you want to do this?

The idea is we have a series of large bodies of text that we index in our Elastic search clusters, for example one would be a news article.

We then want to use the Elasticsearch query to "tag" the article of text.

We build up all the sets of keywords, for example:

set_1 is the "cinema keywords set" which I describe above
set_2 is the "ocean keywords set"
set_3 is the office keywords set.

We may have hundreds of these keywords set.

Then we want to use Elasticsearch to tag a specific article with the keyword sets. For example, a document that mentions cinema and ocean keywords can return the correct tags from the query.

Of course there are ways of doing this on our own, however if we had something like this it would be much easier as we don't want to load in the full body of text, and we also want to avoid rolling our own lemmatization logic, etc.

I hope this makes more sense. Thank you!

I think I understand. You want to use passing a list to check if the terms exist in the text, if so, these terms become tags that can later be indexed in the document to facilitate the search. Right?

In this case, you pass the list using the Terms Query, where a match happens, you use the highlight to mark the text. With the terms marked, you create the list of tags in the document.

The problem I see is that it is not possible to send the list of sets and make the correspondences you want. The option is to make requests for each set separately.

Wouldn't it be better for you to have a service that infers the tags of the texts instead of trying to do it through Elasticsearch?

Thank you! That is the idea we had as well. Going from Highlights -> back to search terms makes sense.

Yes I agree that having a service such as that would be useful however we wanted to keep our documents indexed only in one place (Elasticsearch) if possible.

Following the logic of the Terms Query, the closest to the result you want would be this for set_1. You would have to manipulate the highlight result.

{
  "query": {
    "terms": {
      "text": [
        "movie",
        "cinema",
        "theater"
      ]
    }
  },
  "highlight": {
    "fields": {
      "text": {}
    }
  }
}

Hits

"hits": [
      {
        "_index": "bar",
        "_id": "SzOHmYUBQB-6H-4ZDqaL",
        "_score": 1,
        "_source": {
          "text": "I went to the movie theater, which was down by the beach"
        },
        "highlight": {
          "text": [
            "I went to the <em>movie</em> <em>theater</em>, which was down by the beach"
          ]
        }
      }
    ]

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.