We have 12 node cluster, 1 ingestion, 3 master, 3 hot data, 2 warm data, 2 kibana (same for coordinating) nodes and 1 monitoring node.
We are trying to ingestion data from Hadoop environment using ES-Spark plugin into elasticsearch. We need to ingest last 5 years of data into ES out of which 5th year of data should be ingested into Hot node, and 1-4rth year of data into warm nodes.
Don't see a way in the documentation to achieve this. Right now, when we run spark-submit against ingestion node - the entire data is loaded into both hot and warm nodes which we don't want. Is there any way we can mention that the data should be ingested into hot nodes vs warm nodes.
We would like to run this process daily as the delta of the data is really huge.
Any suggestions would be greatly appreciated.
Thank you very much.