I encountered some unexpected matches when I set up a field with an analyzer that contains an edgeNgram filter.
This is how I set up the mappings and the settings:
PUT test-index
{
"mappings": {
"properties": {
"name" : { "type" : "text", "analyzer" : "my_custom_analyzer" }
}
},
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"type": "custom",
"tokenizer": "standard",
"char_filter": [
"html_strip"
],
"filter": [
"my_edge_gram"
]
}
},
"filter": {
"my_edge_gram": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
}
}
}
}
And then I added three documents like this:
PUT test-index/_doc/1
{
"name": "Mario"
}
PUT test-index/_doc/2
{
"name": "Maria"
}
PUT test-index/_doc/3
{
"name": "Merle"
}
Then when I executed this query:
GET test-index/_search
{
"query": {
"match" : {
"name" : "Mario"
}
}
}
I got all three documents.
I was under the impression that edgeNgram meant that a text field like "Mario" is split into "M", "Ma", "Mar", "Mari" and "Mario". So I thought when I look for "M" I would get all the documents, when I look for "Ma", "Mar" or "Mari", I would get the documents "Mario" and "Maria" and when I look for "Mario", I would only get the document "Mario".
It seems like there is something that I did not understand about edgeNgram. Can you explain it to me?