We have 26nodes in a cluster with three Master nodes and rest are data nodes.
There will huge writes happening through out the day.
We see in one of the nodes the OLD GC was going for a very long time more than 19Hrs and after it is completed. Applications started to timeout. I see the data from that node has gradually reduced from 1.4T to 8GB. Immediately I stopped the ES in that node.
I have two questions.
Had ES removed automatically considering this as stale data?
Can application point to Master nodes, right now they are pointed to few data nodes. By pointing to Master node they get routed rightly as per the responding node?
It may have reallocated data to nodes not deemed to be having problems, but will not delete data.
Dedicated master nodes should not serve traffic. They should be left to manage the cluster so there is minimal chance they will be suffering from long GC.
It didn't delete the data overall in the cluster. I mean there is no data discrepancy in the cluster. Also I agree that remaining nodes would have populated the data on behalf of this GC issue node. But in due course do you think it will cleanup the GC problem node and try to populate new data? I see the data consumption graph as slowly reduced.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.