Completion suggester strange behavior after index optimization


(Mat) #1

Hi

I have got a dataset (>60m documents). I'm trying to build an autocomplete functionality using Completion Suggester. Every document has a custom "score" assigned (integer min. 1000, max. about 40 000) that represents it's importance, so I used this score as weight in my completion field.

Right after index is finished building, I am able to search all documents and completion suggester provides relevant names ordered by weight. However, it works only for a few minutes after rebuilding the index. During that time Marvel shows some activity (index size changes, although document count stays the same). I suspect that's when ES is doing some of it's internal optimizations. After all work is finished, completion suggester returns different results.

Example:
Searching for suggestions for "mic".
a) Results when index is finished building, but not finished optimizing (index size changes over time):
text: "Microprocessor", score: 37000
text: "Micron", score: 32000
text: "Microchip", score: 31500
text: "Micro controller", score: 28000

b) Results when index is finished building and optimizing (I wait until all charts on Marvel are flat):
text: "Micro controller", score: 28000
text: "Microphone", score: 27000
text: "Microwave", score: 22000
text: "Mickey Mouse", score: 21000

So basically some suggestions are missing after optimizations. What's more, for some queries running the same query twice gives different results (sometimes "Microprocessor" is in the results, so I get [Microprocessor, Micro controller, Microphone, Microwave]). When it happens (it happens for some queries, and for others it doesn't), it's weirdly consistent (i.e. every second "mic" query I get "Microprocessor" as the first result).

The described situation happens only for suggestion queries (POST my_index/_suggest). All search queries (POST my_index/_search) produce good, consistent results whenever I use them.

My configuration consists of 3 servers (nodes). The data is split into 3 shards (0,1,2) with 2 shards per node (1-st node: shard 0 + replica 2; 2-nd node: shard 2 + replica 1; 3-rd node: shard 1 + replica 0). The results are the same regardless of which node I send my request to.

My ES version: 2.2.1
Lucene version: 5.4.1

Have you ever encountered such behavior?

Regards
Matthew


(Ccmb China) #2

now do you work out it? i also encounter this problems at this version


(system) #3