Completion suggester strange behavior after index optimization

(Mat) #1


I have got a dataset (>60m documents). I'm trying to build an autocomplete functionality using Completion Suggester. Every document has a custom "score" assigned (integer min. 1000, max. about 40 000) that represents it's importance, so I used this score as weight in my completion field.

Right after index is finished building, I am able to search all documents and completion suggester provides relevant names ordered by weight. However, it works only for a few minutes after rebuilding the index. During that time Marvel shows some activity (index size changes, although document count stays the same). I suspect that's when ES is doing some of it's internal optimizations. After all work is finished, completion suggester returns different results.

Searching for suggestions for "mic".
a) Results when index is finished building, but not finished optimizing (index size changes over time):
text: "Microprocessor", score: 37000
text: "Micron", score: 32000
text: "Microchip", score: 31500
text: "Micro controller", score: 28000

b) Results when index is finished building and optimizing (I wait until all charts on Marvel are flat):
text: "Micro controller", score: 28000
text: "Microphone", score: 27000
text: "Microwave", score: 22000
text: "Mickey Mouse", score: 21000

So basically some suggestions are missing after optimizations. What's more, for some queries running the same query twice gives different results (sometimes "Microprocessor" is in the results, so I get [Microprocessor, Micro controller, Microphone, Microwave]). When it happens (it happens for some queries, and for others it doesn't), it's weirdly consistent (i.e. every second "mic" query I get "Microprocessor" as the first result).

The described situation happens only for suggestion queries (POST my_index/_suggest). All search queries (POST my_index/_search) produce good, consistent results whenever I use them.

My configuration consists of 3 servers (nodes). The data is split into 3 shards (0,1,2) with 2 shards per node (1-st node: shard 0 + replica 2; 2-nd node: shard 2 + replica 1; 3-rd node: shard 1 + replica 0). The results are the same regardless of which node I send my request to.

My ES version: 2.2.1
Lucene version: 5.4.1

Have you ever encountered such behavior?


(Ccmb China) #2

now do you work out it? i also encounter this problems at this version

(system) #3