Is there a way to select what nodes/cluster group gets what shard and what replica.
I am using a hot/warm setup.
I place about 1.1Bil rows an hour though my index's at peak times.
I am thinking of modifying my setup to have 5 nodes using RAM disks to hold 2 hours of data and move the 3rd to the SSD layer.
(burning nodes) where primary shards should start allocating data.
while placing the replica shards on the hot nodes SSD cluster.
Is this possible?
else I guess that i just run 5 node with RAM disks with 1 replica.
then move that every 3rd hour to the SSD layer.
holding 48Hours there before moving to warm nodes for Spinning disk storage
Both primary and replica shards perform essentially the same amount of indexing work, so performance will be driven by the slower node. You can also not control primary shard placement as Elasticsearch can change that whenever necessary.
This basically means that you will need to use hourly indices (or rollover indices with an hourly time limit). Given that you have a 7 month total retention period, that will result in at least (assuming you have a single index type) 5000 indices unless you reindex your data at some point. That sounds like a lot and could potentially be a problem.
You also seem to have a quite small number of warm/cold given the ingest rate and retention period. How much data do you plan on storing on these? if each hot node has 6TB of storage, that would require each warm node to have ~84TB disk, which is more than I think a single node can handle. using the same logic, each cold node would need around 0.5PB storage...
After having indexed data from 7 days.
I re-index dropping data that is not needed after that time.
After dropping the extra fields the index size drops to store-able sizes.
Then for the cold I will re-index and hold smaller segments of data, predefined graph patterns.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.