I have an ingest-heavy workload (many more writes than reads). I am curious about the need / benefits of having ingest-only nodes vs having the ingest done on the data nodes.
I don't think I actually have a specific ingest pipeline, but my question is whether or not things like indexing / analyzing (e.g. stemming) happens on the ingest nodes or not. Is it beneficial for me to offload this processing to ingest-only nodes, or is that only suggested if you have a specific "ingest pipeline"?
Dedicated ingest nodes are generally only required if you are using ingest pipelines. Without pipelines they will acy as coordinating nodes which can be beneficial but does not do a lot of work. indexing and analysis is all done on the data nodes.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.