I would like to understand the performance characteristics of context suggester. I am planning to use completion suggester for a very big index (billions of documents - but only for few fields). the field is an array of strings, which is likely to have duplicates across documents.
Here are some things i would like to understand
- How does adding a context affect the performance ? Does it create one FST for each unique combination of context ?
- What is the recommendation for number of shards (if i decide to put this in a separate index)? should i keep the number of shards minimum ?
- I am planning to use the skip duplicates flag to filter out the duplicates. What is the cost of using this flag ?
- I read that it builds an FST which is kept in heap. what are some recommendations to optimize the performance and memory footprint
Any detailed explanation of the internal implementation detail would also help me understand it better. I read http://blog.mikemccandless.com/2010/12/using-finite-state-transducers-in.html to get the overall idea. I am going over the source in https://github.com/elastic/elasticsearch/tree/master/server/src/main/java/org/elasticsearch/search/suggest/completion. some overall guidance would help me understand the source code better.