I'm setting up an ElasticSearch 5.4 cluster on AWS (can't use the managed service). This is my current setup:
3 master nodes (r4.xlarge, with high-performant EBS storage)
3 data nodes (r4.xlarge, with high-performant EBS storage)
The setup works fine, but I still have some doubts:
If the master nodes are not storing any data, can I safely remove the EBS storage?
On data insertion, it really looks like the master nodes are not suffering at all, while the data nodes are doing the heavy work. Is there anything against moving the master nodes to smaller instance types?
I'm thinking of adding an ingest-only node, but I'm wondering: Can I add just one? What if it goes down for any reason? Is it a recommended practice to have at least 3 nodes of each type to guarantee cluster stability?
I've read in many places but can't seem to find an answer to those questions. Sorry if they're too simple
I did look at ElasticCloud, but for now it's not an option. It may be in the near future though (hopefully... as I always prefer a managed service).
No, I don't have any pipeline. Sorry, didn't add an "IF" on my post! I meant "If I decide to add an ingest-only node...". It's a question out of curiosity only, really.
Regarding local disk, It's most likely faster, but I'll be using enough IOPS for EBS. The reason I'm using EBS is because it's just easier then to take snapshots. My question is if it'd make sense to just remove EBS from the master nodes (as they don't save data), or at least safely assume I can provision less IOPS and disk size? Looks like 3 master nodes are totally capable of handling a large cluster (way larger than mine).
Looked into the S3 repository plugin, but I'm not entirely sure if it's a replacement for backups, and how fast it'll sync writes to the repository.
Great! As you mentioned the managed service on AWS, I thought you were speaking of AWS Elasticsearch Service which is not elastic cloud.
If the master nodes are not storing any data, can I safely remove the EBS storage?
Yes. Local disks are preferable. Note that master node is storing some kind of data. The cluster state is persisted on disk.
On data insertion, it really looks like the master nodes are not suffering at all, while the data nodes are doing the heavy work. Is there anything against moving the master nodes to smaller instance types?
No. It's perfectly fine to have small master only nodes instances. If you don't hold many indices and big mappings, then probably 2 or 4gb RAM instances can be enough.
I'm thinking of adding an ingest-only node, but I'm wondering: Can I add just one? What if it goes down for any reason? Is it a recommended practice to have at least 3 nodes of each type to guarantee cluster stability?
Yes you can. But I'd at least have 2 of them for the reasons you gave. There is no "election" process for ingest so you don't need at least 3 nodes like for master nodes.
Always use local storage, remote filesystems such as NFS or SMB should be avoided. Also beware of virtualized storage such as Amazon’s Elastic Block Storage. Virtualized storage works very well with Elasticsearch, and it is appealing since it is so fast and simple to set up, but it is also unfortunately inherently slower on an ongoing basis when compared to dedicated local storage. If you put an index on EBS, be sure to use provisioned IOPS otherwise operations could be quickly throttled.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.