Tag Doc if match to List of Text terms Keywords

I have a list of terms about 1000 words and I want to have logstash in a Filter tag the document if there is a match in any of the words in a text field.
Example:
text : "The big brown dog"
My List of 1000 terms has Dog as a word.
So Logstash Tags that Document with "Animal"
Notice I would like the match to be caseinsensitive as well.

The reason I would like to do this is that I have tried to use Filters like the below filter and they just don't perform well when you have 1000+ words it is searching through millions of documents. So if I could create a better performing Tag system I could just filter for the Documents that have that TAG.
Anyone out there doing Keyword extractions like this to tag documents coming through Logstash?

{
  "query": {
    "bool": {
      "minimum_should_match": 1,
      "should": [
        {
          "match_phrase": {
            "text": "Dog"
          }
        },
        {
          "match_phrase": {
            "text": "Cat"
          }
        },
        {
          "match_phrase": {
            "text": "1000+ Animal Types"
          }
        }
      ]
    }
  }
}

It is unclear whether you want to test if a field is equal to one of the values, or whether it contains one of the values (i.e. a substring). In either case you can use a translate filter.

That said, in the substring case, matching a field against a thousand regular expressions *dog* *cat* etc. is going to be expensive. You can make the match case-insensitive by doing the translate against a copy of the field which you mutate+lowercase.

Thanks for this. I believe I have a plan going forward with this as the solution.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.