I am trying to understand and implement HOT-Warm Architecture and I have couple of queries
- If in a cluster I have 2 nodes one Hot and one warm one on each separate servers.I have moved some indices to warm node using curator.My concern is whenever I fire a query to which node it will hit HOT or warm ? I mean I can see all the indices on both the nodes it's just their setting has been changed to "node_type": "warm".Here both my nodes are acting as master and data node also both are on SSD.
- If it the hot node hits all the time then how I can utilize and query my warm node indices ?
Sorry for asking such basic question
Please guide .
In a cluster with only 2 nodes and identical specification, I don't think a hot/warm architecture makes sense. In a hot/warm architecture every index resides exclusively in a single zone, so if you want to have replicas in order to provide high availability, you need at least 2 nodes per zone.
ok, so what is your recommendation in my scenario, I have daily index and for 30 days it should be active/live then it will be archived, though it'll still remain searchable and retain for 90 days but it will have less search queries . I still not get it how Hot warm arch. can help me ?
In a hot/warm architecture every index resides exclusively in a single zone
Also what does this means , zone means cluster ?
Do you have a replica configured for your indices so that you do not lose access to data if one of the nodes go down? Are you keeping the data searchable in the cluster for 30 or 90 days?
Yes I do have replica , it is 1 for each index .
Now 30 days data is like super searchable , like all the time user can search this . Analytics team want 30 days data all the time but as it crosses/exceeds 30 days it will become less searchable they will ask for dashboards or reports as in when required .
Ok i get this I guess , please correct me if I am wrong . So in my scenario I will keep 30 days data on Hot node and I'll move older data to warm node in some another clsuter/server on High spin disk and closed those indices on the hot node ?
With only 2 nodes you need to keep all data you want searchable on all nodes if you want to use a replica, which is generally recommended.
The purpose of hot/warm architecture is to have different types of nodes handle different types of data/work loads. You typically have more powerful nodes backed by fast disks power the hot zone (set of nodes in a cluster), and these handle the most recent data and therefore all indexing. Once indices are a few days old and no longer indexed into, they are in reality read only, and they can at this point be relocated to the nodes that make up the warm zone through shard allocation awareness. These nodes can have slower disks and possibly less CPU as they only server queries against older data and do not perform any indexing.
Each index in the system is located on the nodes that make up that zone, which means that if you query recent data you will hit the hot nodes, while you may hit all nodes if you query a longer period.
In order to allow indices to have replicas, you need at least 2 hot nodes and 2 warm nodes.
Read this blog post for further details.