Search by digits doesn't work with edge_ngram


(Mikhail) #1

Hi there,

I am using a edge ngram tokenizer in order to provide partial matching .
My scripts:

PUT docs
{
  "settings": {
    "analysis": {
      "analyzer": {
        "autocomplete": {
          "tokenizer": "autocomplete",
          "filter": ["lowercase"]
        },
        "autocomplete_search": {
          "tokenizer": "lowercase"
        }
      },
      "tokenizer": {
        "autocomplete": {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 30,
          "token_chars": [	"letter","digit","punctuation","symbol"]
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "name": {"type": "text","analyzer": "autocomplete","search_analyzer": "autocomplete_search"}
      }
    }
  }
}

PUT docs/doc/1
{ "name" : "111"}

PUT docs/doc/2
{"name" : "first"}

PUT docs/doc/3 
{"name" : "123456789й"}

PUT docs/doc/4
{"name" : "fir123"}

So, I expect I can find my documents(3 and 4) use next query:

GET docs/_search
{
  "query": {
    "match": {
      "name": "12"
    }
  }
}

Also, I expect I can find my documents(2 and 4) use next query:

GET docs/_search
{
  "query": {
    "match": {
      "name": "Fi"
    }
  }
}

What could be wrong?

Thanks!


(David Pilato) #2

Have a look at what your text is transformed at search time:

POST docs/_analyze
{
  "analyzer": "autocomplete_search",
  "text": [ "123456789й" ]
}
{
  "tokens": [
    {
      "token": "й",
      "start_offset": 9,
      "end_offset": 10,
      "type": "word",
      "position": 0
    }
  ]
}

Have a look at the Doc.


(Mikhail) #3

Hi, Thanks for your answer. I have found solution:

I have set up my analyzer like that:

"autocomplete_search" : {
          "type" : "custom",
          "tokenizer": "keyword",
          "filter": "lowercase"
        }

(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.