Paging w/ multi_match on edge ngram produces duplicate docs

Thomas_Doman · January 16, 2017, 7:17pm

We are currently using ES v1.7.5 on Windows in a .NET environment using NEST v1.7.2. I have discovered that trying to page on an edge ngram field will produce duplicate docs. I can see how an ngram could be tricky to page but this seem like a bug. Here's the operative part of my query:

"must": [{
   "multi_match": {
      "type": "best_fields",
      "query": "joe",
      "analyzer": "default",
      "fields": ["fullName",
                      "fullName.engram"]
      }
   }]

If I leave out the .engram field, it pages fine. Even if I only use the .engram field in the fields element it will yield duplicate docs. For the "fullName" field, our mapping uses a multi-field so, for completeness, here is the definition together w/ how we've defined the edge ngram.

"fullName": {
	"type": "string",
	"fields": {
		"engram": {
			"type": "string",
			"analyzer": "edge_ngram_analyzer"
		},
		...
		another custom analyzer of ours
	}
}

"filter": {
	"edge_ngram_filter": {
		"type": "edgeNGram",
		"min_gram": 2,
		"max_gram": 10
	}
},
"analyzer": {
	"edge_ngram_analyzer": {
		"type": "custom",
		"tokenizer": "icu_tokenizer",
		"filter": ["icu_normalizer",
		"icu_folding",
		"edge_ngram_filter"],
		"char_filter": ["html_strip"]
	}
}

Clinton_Gormley · January 30, 2017, 1:01pm

Hi @Thomas_Doman

I think what you're seeing is "bouncing results" - documents in different shards are stored in different segments, which means that the scores are slightly different. If you use ?preference=$session_id or something similar to ensure that the same user always hits the same shard copy, this behaviour should disappear.

system · February 27, 2017, 1:02pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Unexpected match when using edgeNgram filter Elasticsearch	1	281	April 20, 2021
Question about multi_field and edge ngram Elasticsearch	11	666	July 6, 2017
Edge-ngram not working for single edge case! Elasticsearch	2	198	August 4, 2022
Duplicate documents in paginated query results Elasticsearch	4	6954	July 5, 2017
Duplicates when paging Elasticsearch	3	1085	July 6, 2017

Paging w/ multi_match on edge ngram produces duplicate docs

Related topics