I'm trying to figure out server config for an ES cluster design.
My current config is 3 master nodes and 9 data nodes, each 2gb of ram.
I want to index large amounts of data ~120gb. Currently when I do it in bulk and multiple threads the JVM quickly runs out of memory. If I were to ease the load by slowly indexing data, the time it would take to index all the data is unreasonable.
My question is, which of the nodes take the load of indexing? That is, which type of node, master or data, should have more ram? I'm guessing data because that's where the data goes to be indexed whereas the master only manages indexing.
Am I on the right track here? My concern is a balance between performance and cost.