Hiya
There has been a change in behaviour in how analyzers are applied
between 0.15.2 and 0.16.0.
For instance, in 0.15.2, an 'edge-ngram' tokenizer, was being applied at
index time, but not at search time.
In 0.16.0, it is also applied at search time.
For instance, I index a field containing "london" (with an edge-ngram
tokenizer). I expect a search for 'lon' to match the doc, but not a
search for 'londres'. This was the case in 0.15
However, because the query term is being passed through the same
tokenizer, the query is actually for the terms
"l","lo","lon","lond",londr"...etc so this search DOES match in 0.16
It can easily be worked around by using a different analyzer at index
and search time, but I'm not sure that this is correct behaviour. It
makes sense for (eg) snowball analyzers, but ngrams?
What do you think?
clint
curl -XPUT 'http://127.0.0.1:9200/foo/?pretty=1' -d '
{
"mappings" : {
"bar" : {
"properties" : {
"tokens" : {
"type" : "string",
"analyzer" : "edge_ngram"
}
}
}
},
"settings" : {
"analysis" : {
"filter" : {
"edge_ngram" : {
"side" : "front",
"max_gram" : 20,
"min_gram" : 1,
"type" : "edgeNGram"
}
},
"analyzer" : {
"edge_ngram" : {
"filter" : [
"standard",
"lowercase",
"edge_ngram"
],
"type" : "custom",
"tokenizer" : "standard"
}
}
}
}
}
'
curl -XPOST 'http://127.0.0.1:9200/foo/bar?pretty=1&refresh=true' -d '
{
"tokens" : "london"
}
'
curl -XGET 'http://127.0.0.1:9200/foo/bar/_search?pretty=1' -d '
{
"query" : {
"field" : {
"tokens" : "londres"
}
}
}
'
RESULT IN 0.15.2:
{
"hits" : {
"hits" : [],
"max_score" : null,
"total" : 0
},
"timed_out" : false,
"_shards" : {
"failed" : 0,
"successful" : 5,
"total" : 5
},
"took" : 3
}
RESULT IN 0.16.0: