I'm planning to deploy ECK (using operator) in Azure with the following structure:
Master: 3
Data: 3
Coordinator: 1
Ingest: 1
Is it a good start point?
How much memory, CPU and disk should I allocate for each kind of node? (just an ideia). At this moments for my tests I have all set up with default values (2Gb RAM / 1GB disk).
It is very hard, if not impossible to give a general recommendation to your question. You have to try your setup against your specific use case and see if it holds up.
Two observations nonetheless:
2G RAM/1G storage is the lowest end of the spectrum and I doubt that this will be sufficient unless your use case requires only trivial amounts of data
if you are unsure about your requirements then splitting up the node roles into dedicated master, data, ingest and even coordinator nodes as you have suggested might be premature. How about starting with a 3 node cluster where all nodes fulfill all roles and splitting up once you know that you have that requirement?
About the config of the nodes it's only for test purpose at the moment. No doubts I'll have to increase RAM and storage.
During the weekend I was wondering the same thing you point at the second topic. In this case, should I keep only 3 full role nodes ou use more could be a smart move? Or maybe use 3 master and 3 data/ingest?
I think the simple three node setup I suggested is a good starting point. ECK allows you to transition to a more complicated node topology later on if needed.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.