We are currently using ES v1.7.5 on Windows in a .NET environment using NEST v1.7.2. I have discovered that trying to page on an edge ngram field will produce duplicate docs. I can see how an ngram could be tricky to page but this seem like a bug. Here's the operative part of my query:
"must": [{
"multi_match": {
"type": "best_fields",
"query": "joe",
"analyzer": "default",
"fields": ["fullName",
"fullName.engram"]
}
}]
If I leave out the .engram
field, it pages fine. Even if I only use the .engram
field in the fields element it will yield duplicate docs. For the "fullName" field, our mapping uses a multi-field so, for completeness, here is the definition together w/ how we've defined the edge ngram.
"fullName": {
"type": "string",
"fields": {
"engram": {
"type": "string",
"analyzer": "edge_ngram_analyzer"
},
...
another custom analyzer of ours
}
}
"filter": {
"edge_ngram_filter": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 10
}
},
"analyzer": {
"edge_ngram_analyzer": {
"type": "custom",
"tokenizer": "icu_tokenizer",
"filter": ["icu_normalizer",
"icu_folding",
"edge_ngram_filter"],
"char_filter": ["html_strip"]
}
}