I have about 100M documents using App Search. Performance is very bad. I need more elasticsearch nodes (cpu, ram disk,...), and I need to increase the number of elasticsearch shards for my index.
I saw app search creates by default only 2 shards. I tried to create manualy index with more shards and to link them to App Search, but each time, we can add documents from App Search, but always error when searching in manually sharded index. ex : No mapping found for [location.enum] in order to sort on
My questions are :
How can we create a new engine with more than 2 shards ?
If not, how can we add manually an empty elasticsearch index set up with the wanted shards number, link to a new app search engine ?
If we can't, how can we use app search, as an "elastic" solution abble to be deployed on a horizontal cluster ?
If we can't do you plan to add this very important "Elastic" feature ?
Sorry you are experiencing performance issues with your App Search index at this scale. We are actively working on building an API to make it easy to create an App Search engine with configurable sharding settings.
Some folks have worked around the limitation by splitting up their one large App Search engine into n smaller engines with an App Search meta engine to tie them together. This helps to parallelize queries such as group by queries to take better advantage of CPU and memory.
Would you be willing to share more information about the kinds of queries that are performing poorly? For example, are you doing any grouping on high cardinality fields? There may be additional optimizations we can make. And which version of Enterprise Search are you running?
Hi Rich,
Thanks a lot for your reply. I am happy that you are aware of this limitation and you are working on sharding app search engines. Do you have a relase date for this functionality ?
To keep my project working, I tried to use a work around and i split my big engine into about 250 smaller engines using a functional criteria. But some of this engines are still too big to work properly.
I already banned high cardinality grouping and I use the minimum number of facets for a max of 1000 of cardinality.
I use also fuzzy research with the new Precision search. (I didn't test without precision), and I enabled allowlongquerry.
Elastic stack is 7.13
When engines become more than 100go per shard, I ve got too many 500 error, even if I set up a long timeout.
Api suggestion is count in seconds, and I can't optimize this api because it is not filterable.
Do you think if I set up a lot of replicas for the two app search shards, with a high number of nodes, can I expect to increase performance for big engines ?
Also I would be very happy, if you can give me possible other optimizations I can do.
We have merged the changes to enable custom shard counts on App Search engines and it should be out in one of the next releases. That should allow you to shard your huge engines to many smaller shards.
Re: your suggestion to add a large number of replicas - it will not help with a single query performance because a single query could be parallelized only to a number of threads that is <= to the number of shards. It may help with handling concurrent requests from multiple clients though, so I'd recommend trying it out.
One option that may sound unintuitive, but that still may help is the following: Instead of having one huge 100gb+ engine with 2 shards, you can create a number of smaller engines (something like engine1, engine2, ..., engineN) by manually sharding your documents (using some kind of hash algorithm or based on some document attributes) and then joining them into a single meta-engine. Then, when you run a query on the meta-engine, it will end up being distributed across your cluster into a number of parallel queries (2 per lower level engine).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.