Why is it?
Does it mean that it's meaningless to use multiple data nodes with lower RAM settings?
I have only 2~3GB data, and going to build a searching application (not logging data).
But I think I can improve its searching performance with multiple data nodes.
(Tell me if my idea is wrong)
It's a different approach, you would be better off having nodes in different datacentres (aka zones) to provide redundancy as well as potential performance improvements.
Like I mentioned, different philosophy. The idea being that the cloud is flexible, so you can start with a smaller cluster (or node) and then expand up as needed.
It does allow multiple nodes with lower RAM but you must specify more than 1 availability zone. This is generally enforced best practice as you always should try to distribute your cluster across multiple availability zones in the cloud to achieve higher resiliency. If resiliency is not a concern it is generally better to have a larger node as that reduces network traffic.
Thank you.
I cannot use more than 1 data nodes (per zone) even if I specify more than 1 zones.
So does it mean I should use only 1 data node as long as I use lower RAMs?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.