Paging w/ multi_match on edge ngram produces duplicate docs

We are currently using ES v1.7.5 on Windows in a .NET environment using NEST v1.7.2. I have discovered that trying to page on an edge ngram field will produce duplicate docs. I can see how an ngram could be tricky to page but this seem like a bug. Here's the operative part of my query:

"must": [{
   "multi_match": {
      "type": "best_fields",
      "query": "joe",
      "analyzer": "default",
      "fields": ["fullName",
                      "fullName.engram"]
      }
   }]

If I leave out the .engram field, it pages fine. Even if I only use the .engram field in the fields element it will yield duplicate docs. For the "fullName" field, our mapping uses a multi-field so, for completeness, here is the definition together w/ how we've defined the edge ngram.

"fullName": {
	"type": "string",
	"fields": {
		"engram": {
			"type": "string",
			"analyzer": "edge_ngram_analyzer"
		},
		...
		another custom analyzer of ours
	}
}
"filter": {
	"edge_ngram_filter": {
		"type": "edgeNGram",
		"min_gram": 2,
		"max_gram": 10
	}
},
"analyzer": {
	"edge_ngram_analyzer": {
		"type": "custom",
		"tokenizer": "icu_tokenizer",
		"filter": ["icu_normalizer",
		"icu_folding",
		"edge_ngram_filter"],
		"char_filter": ["html_strip"]
	}
}

Hi @Thomas_Doman

I think what you're seeing is "bouncing results" - documents in different shards are stored in different segments, which means that the scores are slightly different. If you use ?preference=$session_id or something similar to ensure that the same user always hits the same shard copy, this behaviour should disappear.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.