Phrase suggester not giving suggestion when there are duplicate entries


(Janaka Bandara) #1

Hi, I was playing with phrase suggester and was using sample code in the elastic search documention.

Mapping

PUT /test?pretty
{
  "settings": {
    "index": {
      "number_of_shards": 1,
      "analysis": {
        "analyzer": {
          "trigram": {
            "type": "custom",
            "tokenizer": "standard",
            "filter": [
              "standard",
              "shingle"
            ]
          },
          "reverse": {
            "type": "custom",
            "tokenizer": "standard",
            "filter": [
              "standard",
              "reverse"
            ]
          }
        },
        "filter": {
          "shingle": {
            "type": "shingle",
            "min_shingle_size": 2,
            "max_shingle_size": 3
          }
        }
      }
    }
  },
  "mappings": {
    "test": {
      "properties": {
        "title": {
          "type": "text",
          "fields": {
            "trigram": {
              "type": "text",
              "analyzer": "trigram"
            },
            "reverse": {
              "type": "text",
              "analyzer": "reverse"
            }
          }
        }
      }
    }
  }
}

Data

POST /test/test?refresh=true&pretty
{"title": "noble warriors"}
POST /test/test?refresh=true&pretty
{"title": "nobel prize"}

Search query

POST test/_search
{
  "suggest": {
    "text": "noble prize",
    "simple_phrase": {
      "phrase": {
        "field": "title.trigram",
        "size": 1,
        "gram_size": 3,
        "direct_generator": [ {
          "field": "title.trigram",
          "suggest_mode": "always"
        } ],
        "highlight": {
          "pre_tag": "<em>",
          "post_tag": "</em>"
        }
      }
    }
  }
}

Now above code works.
But when I put more duplicate documents as follows phrase suggester stop giving any results.

POST /test/test?refresh=true&pretty
{"title": "noble warriors"}
POST /test/test?refresh=true&pretty
{"title": "nobel prize"}

POST /test/test?refresh=true&pretty
{"title": "noble warriors"}
POST /test/test?refresh=true&pretty
{"title": "nobel prize"}

Any idea why this is happening?

Thank you,


(Abdon Pijpelink) #2

I think what you're running into here is that common (high frequency) words are excluded from suggestions by default. By indexing the same text over and over again, those terms become very common and excluded from suggestions.

If you really want those terms to be suggested, you could add the max_term_freq parameter to direct_generator with a very high fraction (for example 0.999). Your example would become:

POST test/_search
{
  "suggest": {
    "text": "noble prize",
    "simple_phrase": {
      "phrase": {
        "field": "title.trigram",
        "size": 1,
        "gram_size": 3,
        "direct_generator": [ {
          "field": "title.trigram",
          "max_term_freq": 0.999,
          "suggest_mode": "always"
        } ],
        "highlight": {
          "pre_tag": "<em>",
          "post_tag": "</em>"
        }
      }
    }
  }
}

(Janaka Bandara) #3

Thank you @abdon for a perfect explanation. :ok_hand:


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.