Hi,
We have 2 Elasticsearch clusters in our development environment.
One of them is our development cluster with 9 nodes having
- 4 Data nodes (with 4 GB heap)
- 3 Master eligible nodes (default heap)
- 2 Search Load Balancers (default heap)
The second is our monitoring cluster for storing Marvel data of the development cluster. This cluster has 2 nodes running with default configuration.
All the above nodes are running the latest ES version 1.1.1 and the latest Marvel version which is 1.1.0.
Of late we have been seeing issues in the Marvel cluster. One of the nodes in the Marvel cluster throws the following exception continuously:
[.marvel-2014.04.25][0], node[dA2UtjgdQ1S55zgvQHOHYQ], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@24de815]
org.elasticsearch.search.SearchParseException: [.marvel-2014.04.25][0]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"facets":{"0":{"date_histogram":{"key_field":"@timestamp","value_field":"total.search.query_total","interval":"1m"},"global":true,"facet_filter":{"fquery":{"query":{"filtered":{"query":{"query_string":{"query":"_type:indices_stats"}},"filter":{"bool":{"must":[{"range":{"@timestamp":{"from":1398434986844,"to":"now"}}}]}}}}}}}},"size":50,"query":{"filtered":{"query":{"query_string":{"query":"_type:cluster_event OR _type:node_event"}},"filter":{"bool":{"must":[{"range":{"@timestamp":{"from":1398434986844,"to":"now"}}}]}}}},"sort":[{"@timestamp":{"order":"desc"}},{"@timestamp":{"order":"desc"}}]}]]
at org.elasticsearch.search.SearchService.parseSource(SearchService.java:634)
at org.elasticsearch.search.SearchService.createContext(SearchService.java:507)
at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:480)
at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:324)
at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteFetch(SearchServiceTransportAction.java:304)
at org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryAndFetchAction.java:71)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$4.run(TransportSearchTypeAction.java:296)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.elasticsearch.search.facet.FacetPhaseExecutionException: Facet [0]: (value) field [total.search.query_total] not found
at org.elasticsearch.search.facet.datehistogram.DateHistogramFacetParser.parse(DateHistogramFacetParser.java:186)
at org.elasticsearch.search.facet.FacetParseElement.parse(FacetParseElement.java:93)
at org.elasticsearch.search.SearchService.parseSource(SearchService.java:622)
... 10 more
It keeps repeating at regular intervals. Also this is observed in only one of the 2 nodes of the monitoring cluster. Usually it is the master which shows this exception.
Similar exceptions are observed in the Marvel dashboard - Cluster Overview page.
Also in the development cluster in one of the Master nodes, we see ClusterBlockException [shard state 0 not initialized or recovered] for the monitoring cluster.
Please explain why this is happening. One more thing to add, we are facing this problem ever since we migrated to ES 1.1.0. Before that while running 1.0.0, no such things were observed.
Looking forward to your reply.