Have you identified what is limiting performance? Is it disk I/O on the data nodes? Long or frequent GC on data or coordinating nodes? Is CPU saturated on some nodes? Is networking proving to be a bottleneck?
How much data do you have in the cluster? How many indices and shards? What type of complex queries are you running? How do you perform these regular updates?
I have around 120GB of data with 6 indices out of which 1 is major with around 110GB of data which is queried frequently. All indices have 5 shards. Partial Updates are done throughout the day on the largest index.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.