Edge nGram token filter doesn't seem to work


(havetobe unknown) #1

Hi everyone,

I'm trying to get the Edge nGram token filter working based on the following documentation: https://www.elastic.co/guide/en/elasticsearch/guide/current/_index_time_search_as_you_type.html

I'm using version 1.7 and I can't get the same result as what is described in the docs. Here's my index settings:

$ curl -XPUT --data-binary @index_settings.json http://localhost:9200/test_ngram

{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0,
    "mappings": {
      "my_type": {
        "properties": {
          "name": {
            "type": "string",
            "analyzer": "autocomplete"
          }
        }
      }
    },
    "analysis": {
      "filter": {
        "autocomplete_filter": {
          "type":     "edge_ngram",
          "min_gram": 1,
          "max_gram": 20
        }
      },
      "analyzer": {
        "autocomplete": {
          "type":      "custom",
          "tokenizer": "standard",
          "filter": [ "lowercase", "autocomplete_filter" ]
        }
      }
    }
  }
}

I'm indexing some objects:

$ curl -XPOST --data-binary @docs.json http://localhost:9200/test_ngram/my_type/_bulk

{ "index": { "_id": 1            }}
{ "name": "Brown foxes"    }
{ "index": { "_id": 2            }}
{ "name": "Yellow furballs" }

Now trying to search:

$ curl -XPOST --data-binary @search.json localhost:9200/test_ngram/my_type/_search?pretty

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.15891947,
    "hits" : [ {
      "_index" : "test_ngram",
      "_type" : "my_type",
      "_id" : "1",
      "_score" : 0.15891947,
      "_source":{ "name": "Brown foxes"    }
    } ]
  }
}

So this returns only the first document not the second one unlike the documentation result. I run an explain on the query and the result show that it doesn't seem to be analyzed with the edge_ngram token filter:

$ curl -XPOST --data-binary @search.json 'localhost:9200/test_ngram/my_type/_validate/query?explain&pretty'

{
  "valid" : true,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "explanations" : [ {
    "index" : "test_ngram",
    "valid" : true,
    "explanation" : "filtered(name:brown name:fo)->cache(_type:my_type)"
  } ]
}

It searches for the whole terms "brown" or "fo" but not "b", "br", "bro" and so on which should be the expected behavior thus returning the two documents when searching. I also tried to force the analyzer by setting both index_analyzer and search_analyzer with no luck.

I'm pretty sure I'm doing something wrong but I can't put my finger on it. Does anyone have any clue ?

Thanks


(havetobe unknown) #2

Ok guys, I've just managed to find the answer on my own. In the index definition, I was putting the mapping block under the settings block which is wrong. The mapping block should reside at the same level ! So the correct index definition is as follows:

{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0,
    "analysis": {
      "filter": {
        "autocomplete_filter": {
          "type":     "edge_ngram",
          "min_gram": 1,
          "max_gram": 20
        }
      },
      "analyzer": {
        "autocomplete": {
          "type":      "custom",
          "tokenizer": "standard",
          "filter": [ "lowercase", "autocomplete_filter" ]
        }
      }
    }
  },
  "mappings": {
    "my_type": {
      "dynamic": "strict",
      "properties": {
        "name": {
          "type": "string",
          "analyzer": "autocomplete"
        }
      }
    }
  }
}

It should have been great for ES to throw an error with my first configuration. Anyway, problem solved :slight_smile:


(system) #3