Percolate query alternative for given use case

I'm checking to see if there is a better way to achieve the same goal that the percolate query does for me but with less overhead.

I have an index where each document is a product (say Acme Anvils). I want to determine if a piece of text, say a customer review, is mentioning a specific product. The way I have this currently working is that each document that represents a product has a field of type percolator and the value for that is a match_phrase query. Here is the index template and example document

Index Template:

PUT /_template/product-percolate-test
{
  "index_patterns": [
    "product-percolate-test"
  ],
  "settings": {
    "refresh_interval": "1s",
    "number_of_shards": 5,
    "number_of_replicas": 1,
    "analysis": {
      "filter": {
        "english_stop": {
          "type": "stop",
          "stopwords": "_english_"
        },
        "english_stemmer": {
          "type": "stemmer",
          "language": "english"
        },
        "english_possessive_stemmer": {
          "type": "stemmer",
          "language": "possessive_english"
        }
      },
      "analyzer": {
        "english_search": {
          "tokenizer": "standard",
          "filter": [
            "english_possessive_stemmer",
            "lowercase",
            "asciifolding",
            "english_stop",
            "english_stemmer"
          ]
        },
        "english_mention": {
          "tokenizer": "uax_url_email",
          "filter": [
            "lowercase",
            "asciifolding",
            "english_stop"
          ]
        }
      }
    }
  },
  "mappings": {
    "dynamic": "strict",
    "properties": {
      "abv": {
        "type": "double"
      },
      "name": {
        "type": "text",
        "fields": {
          "analyzed": {
            "type": "text",
            "analyzer": "english_search"
          }
        }
      },
      "manufacturer": {
        "type": "text",
        "fields": {
          "analyzed": {
            "type": "text",
            "analyzer": "english_search"
          }
        }
      },
      "percolator-message": {
        "type": "text",
        "analyzer": "english_mention"
      },
      "percolator-query": {
        "type": "percolator"
      }
    }
  }
}

Example Product Document

{
  "_index" : "product-percolate-test",
  "_type" : "_doc",
  "_id" : "00c68132-19b5-488f-8ea9-ee4b5d97a996",
  "_score" : 1.0,
  "_source" : {
    "name" : "Anvil 3000 Deluxe",
    "manufacturer" : "Acme",
    "percolator-query" : {
      "match_phrase" : {
        "percolator-message" : {
          "query" : "Anvil 3000 Deluxe",
          "slop" : 1
        }
      }
    }
  }
}

With this, I can run the following percolate query and get the following results:

Percolate Query

GET /product-percolate-test/_search
{
  "query": {
    "percolate": {
      "field": "percolator-query",
      "document": {
        "percolator-message" : "I purchased the Anvil 3000 Deluxe from Acme about a week ago. I would not recommend this product. Although it is a good anvil, I have yet been able to successfully catch any road runners with it. It instead keeps falling on my head."
      }
    }
  }
}

Percolate Results

{
  "took" : 33,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.6538229,
    "hits" : [
      {
        "_index" : "product-percolate-test",
        "_type" : "_doc",
        "_id" : "00c68132-19b5-488f-8ea9-ee4b5d97a996",
        "_score" : 0.6538229,
        "_source" : {
          "name" : "Anvil 3000 Deluxe",
          "style" : "Acme",
          "percolator-query" : {
            "match_phrase" : {
              "percolator-message" : {
                "query" : "Anvil 3000 Deluxe",
                "slop" : 1
              }
            }
          }
        },
        "fields" : {
          "_percolator_document_slot" : [
            0
          ]
        }
      }
    ]
  }
}

Here are the issues I see with this:

  1. There is a percolate query for each product document and the number of documents is likely to reach into the tens-of-thousands
  2. The percolate query for each product document is exactly the same with the exception of the match_phrase query value which is always the product name
  3. If i need to change the queries used for this, i would need to modify each document likely through a reindex operation

I am wondering if there is a more optimal way of achieving the same result.

Thank you

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.