Paging w/ multi_match on edge ngram produces duplicate docs


(Thomas Doman) #1

We are currently using ES v1.7.5 on Windows in a .NET environment using NEST v1.7.2. I have discovered that trying to page on an edge ngram field will produce duplicate docs. I can see how an ngram could be tricky to page but this seem like a bug. Here's the operative part of my query:

"must": [{
   "multi_match": {
      "type": "best_fields",
      "query": "joe",
      "analyzer": "default",
      "fields": ["fullName",
                      "fullName.engram"]
      }
   }]

If I leave out the .engram field, it pages fine. Even if I only use the .engram field in the fields element it will yield duplicate docs. For the "fullName" field, our mapping uses a multi-field so, for completeness, here is the definition together w/ how we've defined the edge ngram.

"fullName": {
	"type": "string",
	"fields": {
		"engram": {
			"type": "string",
			"analyzer": "edge_ngram_analyzer"
		},
		...
		another custom analyzer of ours
	}
}
"filter": {
	"edge_ngram_filter": {
		"type": "edgeNGram",
		"min_gram": 2,
		"max_gram": 10
	}
},
"analyzer": {
	"edge_ngram_analyzer": {
		"type": "custom",
		"tokenizer": "icu_tokenizer",
		"filter": ["icu_normalizer",
		"icu_folding",
		"edge_ngram_filter"],
		"char_filter": ["html_strip"]
	}
}

(Clinton Gormley) #2

Hi @Thomas_Doman

I think what you're seeing is "bouncing results" - documents in different shards are stored in different segments, which means that the scores are slightly different. If you use ?preference=$session_id or something similar to ensure that the same user always hits the same shard copy, this behaviour should disappear.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.