ES Stops Works Randomly


(Eric Luellen) #1

Hello,

I have 2 ES servers that are being fed by 1 logstash server and viewing the
logs in Kibana. This is a POC to work out any issues before going into
production. The system has ran for ~1 month and every few days, Kibana will
stop showing logs at some random time in the middle of the night. Last
night, the last log entry I received in Kibana was around 18:30. When I
checked on the ES servers, it showed the master running and the secondary
not running (from /sbin/service elasticsearch status), but I was able to do
a curl on the localhost and it returned information. So not sure what's up
with that. Anyway, when I do a status on the master node, I get this:

curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
"cluster_name" : "gis-elasticsearch",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 6,
"number_of_data_nodes" : 2,
"active_primary_shards" : 186,
"active_shards" : 194,
"relocating_shards" : 0,
"initializing_shards" : 7,
"unassigned_shards" : 249
}

When I view the indexes, via "ls ...nodes/0/indeces/" it shows all indexes
being modified today for some reason and there are new file for today's
date.So I think I'm starting to catch back up after I restarted both
servers but not sure why it failed in the first place. When I look at the
logs on the master, I only see 4 warning errors at 18:57 and then the
2ndary leaving the cluster. I don't see any logs on the secondary (Pistol)
on why it stopped working or what truly happened.

[2014-03-06 18:57:04,121][WARN ][transport ] [ElasticSearch
Server1] Transport response handler not found of id [64147630]
[2014-03-06 18:57:04,124][WARN ][transport ] [ElasticSearch
Server1] Transport response handler not found of id [64147717]
[2014-03-06 18:57:04,124][WARN ][transport ] [ElasticSearch
Server1] Transport response handler not found of id [64147718]
[2014-03-06 18:57:04,124][WARN ][transport ] [ElasticSearch
Server1] Transport response handler not found of id [64147721]

[2014-03-06 19:56:08,467][INFO ][cluster.service ] [ElasticSearch
Server1] removed
{[Pistol][sIAMHNj6TMCmrMJGW7u97A][inet[/10.1.1.10:9301]]{client=true,
data=false},}, reason:
zen-disco-node_failed([Pistol][sIAMHNj6TMCmrMJGW7u97A][inet[/10.13.3.46:9301]]{client=true,
data=false}), reason failed to ping, tried [3] times, each with maximum
[30s] timeout
[2014-03-06 19:56:12,304][INFO ][cluster.service ] [ElasticSearch
Server1] added
{[Pistol][sIAMHNj6TMCmrMJGW7u97A][inet[/10.1.1.10:9301]]{client=true,
data=false},}, reason: zen-disco-receive(join from
node[[Pistol][sIAMHNj6TMCmrMJGW7u97A][inet[/10.13.3.46:9301]]{client=true,
data=false}])

Any idea on additional logging or troubleshooting I can turn on to keep
this from happening in the future? Since the shards are not caught up,
right now I"m just seeing a lot o debug messages about failed to parse. I'm
assuming that will be corrected once we catch up.

[2014-03-07 10:06:52,235][DEBUG][action.search.type ] [ElasticSearch
Server1] All shards failed for phase: [query]
[2014-03-07 10:06:52,223][DEBUG][action.search.type ] [ElasticSearch
Server1] [windows-2014.03.07][3], node[W6aEFbimR5G712ddG_G5yQ], [P],
s[STARTED]: Failed to execute
[org.elasticsearch.action.search.SearchRequest@74ecbbc6] lastShard [true]
org.elasticsearch.search.SearchParseException: [windows-2014.03.07][3]:
from[-1],size[-1]: Parse Failure [Failed to parse source
[{"facets":{"0":{"date_histogram":{"field":"@timestamp","interval":"10m"},"global":true,"facet_filter":{"fquery":{"query":{"filtered":{"query":{"query_string":{"query":"(ASA
AND
Deny)"}},"filter":{"bool":{"must":[{"range":{"@timestamp":{"from":1394118412373,"to":"now"}}}]}}}}}}}},"size":0}]]

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0a693898-16e3-449f-9bf5-6adc97251e09%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #2