Got a test cluster of 6+ data nodes and 3 masters (dataless). About 100+ indexes, each about 20GB in size. Deployed in AWS using zen discovery. Using mostly default configs for discovery and recovery. Most activity is indexing and at fairly modest rates.
Am starting to see issues with nodes dropping out of the cluster. Some stay out, and some are able to eventually rejoin. Applies to both data and master nodes, although I have not quite seen the elected master dropping out yet.
Logs seem to show zen ping timeouts on both master and data nodes. This is puzzling because the default fd timeout is 30s and 3 retries. Increasing these to 60s and 6 retries, respectively seems to help a little, but really just prolongs the issue.
Anyone seeing similar issues and have possible solutions around this? Is there any way to prevent a node, particularly masters, from failing to respond to zen pings? And without having to set timeouts so high? 30s should ideally be more than sufficient. This seems to be a very serious issue if basic, built-in cluster mgmt operations begin to fail.
I'm seeing this with both 0.90.1 and 0.90.2.
Thanks,
-Vinh
On Jul 19, 2013, at 9:42 AM, Vinh Nguyen vinh@loggly.com wrote:
Got a test cluster of 6+ data nodes and 3 masters (dataless). About 100+ indexes, each about 20GB in size. Deployed in AWS using zen discovery. Using mostly default configs for discovery and recovery. Most activity is indexing and at fairly modest rates.
Am starting to see issues with nodes dropping out of the cluster. Some stay out, and some are able to eventually rejoin. Applies to both data and master nodes, although I have not quite seen the elected master dropping out yet.
Logs seem to show zen ping timeouts on both master and data nodes. This is puzzling because the default fd timeout is 30s and 3 retries. Increasing these to 60s and 6 retries, respectively seems to help a little, but really just prolongs the issue.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.