I have an update heavy use case where each record gets updated multiple times every minute. There will only be several million documents with each of size less than 4KB. Also the rate at which the documents in each index gets updated is not consistent. One index can have much more updates than another index. In such scenarios is it better to have one index with enough shards (say like 1 index with 50 shards) vs multiple indexes with lesser number of shards (50 indexes with 2 shards each).
regarding search there is no difference for Elasticsearch if you search one index consisting of 50 shards or 50 indices consisting of one shard each: in total 50 shards need to be searched.
A general rule of thumb should be to try and keep the number of shards low.
Several million documents with 4kb does not sound terribly big and should not be stored in 50 shards. How did you come up with that number? Did you compare the performance against 5 shards and it was faster?
Coming back to the update heavy use-case. Basically an update of a document is a reindexing operation of a document. Which means you will have tens of millions of updates every minute, thus easily several hundred k writes per second. This however is a very write heavy load and will require a fair share of resources.
Currently I have an index per customer and there are about 40 indexes having 2 shards each. I am facing huge performance issue even though there are 10 nodes with with 3 dedicated masters and 7 data nodes with each having 8 vcores. The CPU spikes to >90% and the bulk requests starts failing.
Thats why I'm trying to explore other options to see if singles index would work. 50 shards was just an approximate number.
Do you think elasticsearch is the right tool given that there are so many updates per second? I ended up with elasticsearch since we need to search documents by partial matches, aggregations etc.
Note: There would easily be 10k updates per second during peak times
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.