You have a few questions that I'm going to address separately:
are there any benifits of introducing new ingest only nodes?
If you are not using ingest pipelines, then no: ingest nodes only run ingest pipelines. You may be using ingest pipelines without realizing it if you are using any of the Beats family of data shippers, the Monitoring solution built into Elasticsearch, or a few other things. You can check if you have any ingest pipelines by running
never to send write/read to master nodes, what are the motivations/goals behind this?
Basically, this is just so that your dedicated master nodes can be just that: dedicated to maintaining the cluster state and not have to spend resources on anything else. It would cause problems if you received a sudden spike of indexing or search traffic and the increased load caused a master node to slow down or crash, so it's safer, especially for larger clusters that handle a lot of traffic, to let your master nodes be master nodes and your data nodes be data nodes.
please point to any resources mentioning the system requirements for master only & ingest only & data only nodes
This is going to depend on your workload, but the hardware recommendations in the Definitive Guide are still very relevant, even if the Guide itself is a bit dated at this point.
It's effectively impossible to give specific guidance without being much more familiar with your needs, but very generally:
- Data nodes are going to be the most sensitive to changes in resources, and will need the most resources. Lots of RAM (although not more than 64GB per node, with half of physical memory dedicated to Elasticsearch heap due to compressed OOPS and Lucene being hungry for filesystem cache), a decent amount of CPU, and lots of disk space on high-bandwidth disks, preferably SSDs.
- Master nodes are a bit less resource-hungry in general, although if you have lots of indices, lots of nodes, or have frequent cluster state updates (settings changes, shard relocations, etc), they may need to be more powerful. We typically recommend 3 dedicated master nodes, even for clusters with lots of data nodes (with all other nodes being non-master-eligible).
- Ingest and Coordinating (aka Client) nodes will typically need a good amount of CPU and RAM, but relatively little disk as they do not store much data.
It's often worth taking some time to create a benchmarking suite using Rally, our custom-made tool to evaluate Elasticsearch performance, that emulates your workload so you can see how a cluster will respond to load. This will allow you to evaluate the performance of your cluster as you add nodes, remove nodes, or change the resources allocated to your nodes. If you do this, do it on a dedicated, temporary benchmarking cluster to avoid 1) impacting your production system, and 2) having any other traffic impact your benchmark results.