Resource allocation, data vs master nodes

Hi,

I was wondering how data only nodes work. Do they load the data an memory also? Do I need to allocate large memory machines for them?

Can you explain what you mean here?

It depends... You can load test data, search it and look at /_stats&pretty to get memory usage. I've never found any official formula of needed RAM computation. So every value should be calculated manually (with your data and data usage type). But beware of random generated data because measurement from it can be far from same measurement on real data,

In my case I've got stuck with terms in memory (too many small events with distinct terms for long term retention period). Some guys stuck with field_data with sorting or aggregations (this can be fixed by doc_values).

What I meant to say was and I hope I make this as clear is possible.

If I specify the following in elasticsearch.yml
node.master: false
node.data: true

This should create a data only node.

My question is would this node also load the index in memory? Or would it act just as a storage location for master only nodes to fetch the data from and load them in memory, in the master node?

ES doesn't load the index into memory, it may load some documents into memory but it doesn't keep them there.

Also if you only have one master then you shouldn't be running all queries and index requests via it, as if you OOM it then you lose your cluster.