Reading from elastic documentation:
the
match_phrase
query first analyzes the query string to produce a list of terms. It then searches for all the terms, but keeps only documents that contain all of the search terms, in the same positions relative to each other.
I have configured my analyzer to use edge_ngram with keyword tokenizer :
{
"index": {
"number_of_shards": 1,
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
}
}
Here is the java class that is used for indexing :
@Document(indexName = "myindex", type = "program")
@Getter
@Setter
@Setting(settingPath = "/elasticsearch/settings.json")
public class Program {
@org.springframework.data.annotation.Id
private Long instanceId;
@Field(analyzer = "autocomplete",searchAnalyzer = "autocomplete",type = FieldType.String )
private String name;
}
if I have the following phrase in document "hello world", the following query will match it :
{
"match" : {
"name" : {
"query" : "ho",
"type" : "phrase"
}
}
}
result : "hello world"
that's not what I expect because not all of the search terms in the document.
my questions :
1- shouldn't I have 2 search terms in the edge_ngram/autocomplete for the query "ho" ? (the terms should be "h" and "ho" respectively. )
2- why does "ho" match "hello world" when all of the terms according to the definition of phrase query didn't match ? ("ho" term shouldn't have match)
update:
just in case that the question is not clear. The match phrase query should analyze the string to list of terms , here it's ho
. Now we will have 2 terms as this is edge_ngram with 1
min_gram. The 2 terms are h
and ho
. according to elasticsearch the document must contain all of the search terms. However hello world
has h
only and doesn't have ho
so why I did get a match here ?
version : elasticsearch-2.x