We're using unicast now (Rackspace doesn't allow multicast traffic).
Here's a sample of what's in the logs during the issues. This kind of
things was steaming pretty much continuously:
[2012-01-16 02:52:41,711][WARN ][indices.cluster ] [prod-es-
r03] [contact_documents-527859-0][0] master [[prod-es-r06][IfNWkYASSg-
TOZuMI7nj5w][inet[/10.180.46.203:9300]]] marked shard as started, but
shard have not been created, mark shard as failed
[2012-01-16 02:52:41,711][WARN ][cluster.action.shard ] [prod-es-
r03] sending failed shard for [contact_documents-527859-0][0],
node[zB6rqHbHQrm727WdL5iXrw], [R], s[STARTED], reason [master [prod-es-
r06][IfNWkYASSg-TOZuMI7nj5w][inet[/10.180.46.203:9300]] marked shard
as started, but shard have not been created, mark shard as failed]
[2012-01-16 02:52:41,880][WARN ][indices.cluster ] [prod-es-
r03] [contact_documents-194054-1322678627][0] master [[prod-es-r06]
[IfNWkYASSg-TOZuMI7nj5w][inet[/10.180.46.203:9300]]] marked shard as
started, but shard have not been created, mark shard as failed
[2012-01-16 02:52:41,880][WARN ][cluster.action.shard ] [prod-es-
r03] sending failed shard for [contact_documents-194054-1322678627]
[0], node[zB6rqHbHQrm727WdL5iXrw], [R], s[STARTED], reason [master
[prod-es-r06][IfNWkYASSg-TOZuMI7nj5w][inet[/10.180.46.203:9300]]
marked shard as started, but shard have not been created, mark shard
as failed]
[2012-01-16 02:52:41,894][WARN ][indices.cluster ] [prod-es-
r03] [contact_documents-527859-0][0] master [[prod-es-r06][IfNWkYASSg-
TOZuMI7nj5w][inet[/10.180.46.203:9300]]] marked shard as started, but
shard have not been created, mark shard as failed
[2012-01-16 02:52:41,894][WARN ][cluster.action.shard ] [prod-es-
r03] sending failed shard for [contact_documents-527859-0][0],
node[zB6rqHbHQrm727WdL5iXrw], [R], s[STARTED], reason [master [prod-es-
r06][IfNWkYASSg-TOZuMI7nj5w][inet[/10.180.46.203:9300]] marked shard
as started, but shard have not been created, mark shard as failed]
On Jan 16, 2:57 pm, Ævar Arnfjörð Bjarmason ava...@gmail.com wrote:
You might want to try switching from multicast to unicast just to
eliminate a variable.
Some networks don't treat multicast traffic very well.
It's also useful to look at the logs for the ES nodes during these
outages. What do they say?