SearchParseExceptions in Marvel monitoring cluster

Hi,

We have 2 Elasticsearch clusters in our development environment.
One of them is our development cluster with 9 nodes having

  • 4 Data nodes (with 4 GB heap)
  • 3 Master eligible nodes (default heap)
  • 2 Search Load Balancers (default heap)

The second is our monitoring cluster for storing Marvel data of the development cluster. This cluster has 2 nodes running with default configuration.
All the above nodes are running the latest ES version 1.1.1 and the latest Marvel version which is 1.1.0.

Of late we have been seeing issues in the Marvel cluster. One of the nodes in the Marvel cluster throws the following exception continuously:
[.marvel-2014.04.25][0], node[dA2UtjgdQ1S55zgvQHOHYQ], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@24de815]
org.elasticsearch.search.SearchParseException: [.marvel-2014.04.25][0]: from[-1],size[-1]: Parse Failure [Failed to parse source [{"facets":{"0":{"date_histogram":{"key_field":"@timestamp","value_field":"total.search.query_total","interval":"1m"},"global":true,"facet_filter":{"fquery":{"query":{"filtered":{"query":{"query_string":{"query":"_type:indices_stats"}},"filter":{"bool":{"must":[{"range":{"@timestamp":{"from":1398434986844,"to":"now"}}}]}}}}}}}},"size":50,"query":{"filtered":{"query":{"query_string":{"query":"_type:cluster_event OR _type:node_event"}},"filter":{"bool":{"must":[{"range":{"@timestamp":{"from":1398434986844,"to":"now"}}}]}}}},"sort":[{"@timestamp":{"order":"desc"}},{"@timestamp":{"order":"desc"}}]}]]
at org.elasticsearch.search.SearchService.parseSource(SearchService.java:634)
at org.elasticsearch.search.SearchService.createContext(SearchService.java:507)
at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:480)
at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:324)
at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteFetch(SearchServiceTransportAction.java:304)
at org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryAndFetchAction.java:71)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$4.run(TransportSearchTypeAction.java:296)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.elasticsearch.search.facet.FacetPhaseExecutionException: Facet [0]: (value) field [total.search.query_total] not found
at org.elasticsearch.search.facet.datehistogram.DateHistogramFacetParser.parse(DateHistogramFacetParser.java:186)
at org.elasticsearch.search.facet.FacetParseElement.parse(FacetParseElement.java:93)
at org.elasticsearch.search.SearchService.parseSource(SearchService.java:622)
... 10 more

It keeps repeating at regular intervals. Also this is observed in only one of the 2 nodes of the monitoring cluster. Usually it is the master which shows this exception.
Similar exceptions are observed in the Marvel dashboard - Cluster Overview page.

Also in the development cluster in one of the Master nodes, we see ClusterBlockException [shard state 0 not initialized or recovered] for the monitoring cluster.

Please explain why this is happening. One more thing to add, we are facing this problem ever since we migrated to ES 1.1.0. Before that while running 1.0.0, no such things were observed.

Looking forward to your reply.

Hi

Just to add to above, lately we observe that these exceptions vanish from the dashboard after some time and everything returns to normal.

So why do these exceptions occur at the beginning?

Hi Mihir,

This type of error typically ocour when the marvel index doesn't contain
the right data. I'm intrigued by the ClusterBlockException on you
monitoring cluster.

Can you gist the output of : curl SERVER:9200/_cat/shards/?v for both nodes
of you marvel cluster?

Thx,
Boaz

On Monday, April 28, 2014 2:43:30 PM UTC+2, Mihir M wrote:

Hi,

We have 2 Elasticsearch clusters in our development environment.
One of them is our development cluster with 9 nodes having

  • 4 Data nodes (with 4 GB heap)
  • 3 Master eligible nodes (default heap)
  • 2 Search Load Balancers (default heap)

The second is our monitoring cluster for storing Marvel data of the
development cluster. This cluster has 2 nodes running with default
configuration.
All the above nodes are running the latest ES version 1.1.1 and the latest
Marvel version which is 1.1.0.

Of late we have been seeing issues in the Marvel cluster. One of the nodes
in the Marvel cluster throws the following exception continuously:
[.marvel-2014.04.25][0], node[dA2UtjgdQ1S55zgvQHOHYQ], [P], s[STARTED]:
Failed to execute [org.elasticsearch.action.search.SearchRequest@24de815]
org.elasticsearch.search.SearchParseException: [.marvel-2014.04.25][0]:
from[-1],size[-1]: Parse Failure [Failed to parse source
[{"facets":{"0":{"date_histogram":{"key_field":"@timestamp","value_field":"total.search.query_total","interval":"1m"},"global":true,"facet_filter":{"fquery":{"query":{"filtered":{"query":{"query_string":{"query":"_type:indices_stats"}},"filter":{"bool":{"must":[{"range":{"@timestamp":{"from":1398434986844,"to":"now"}}}]}}}}}}}},"size":50,"query":{"filtered":{"query":{"query_string":{"query":"_type:cluster_event

OR
_type:node_event"}},"filter":{"bool":{"must":[{"range":{"@timestamp":{"from":1398434986844,"to":"now"}}}]}}}},"sort":[{"@timestamp":{"order":"desc"}},{"@timestamp":{"order":"desc"}}]}]]

    at 

org.elasticsearch.search.SearchService.parseSource(SearchService.java:634)
at
org.elasticsearch.search.SearchService.createContext(SearchService.java:507)

    at 

org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:480)

    at 

org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:324)

    at 

org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteFetch(SearchServiceTransportAction.java:304)

    at 

org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryAndFetchAction.java:71)

    at 

org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)

    at 

org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$4.run(TransportSearchTypeAction.java:296)

    at 

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

    at 

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

    at java.lang.Thread.run(Thread.java:744) 

Caused by: org.elasticsearch.search.facet.FacetPhaseExecutionException:
Facet [0]: (value) field [total.search.query_total] not found
at
org.elasticsearch.search.facet.datehistogram.DateHistogramFacetParser.parse(DateHistogramFacetParser.java:186)

    at 

org.elasticsearch.search.facet.FacetParseElement.parse(FacetParseElement.java:93)

    at 

org.elasticsearch.search.SearchService.parseSource(SearchService.java:622)
... 10 more

It keeps repeating at regular intervals. Also this is observed in only one
of the 2 nodes of the monitoring cluster. Usually it is the master which
shows this exception.
Similar exceptions are observed in the Marvel dashboard - Cluster Overview
page.

Also in the development cluster in one of the Master nodes, we see
ClusterBlockException [shard state 0 not initialized or recovered] for the
monitoring cluster.

Please explain why this is happening. One more thing to add, we are facing
this problem ever since we migrated to ES 1.1.0. Before that while running
1.0.0, no such things were observed.

Looking forward to your reply.


Regards

View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/SearchParseExceptions-in-Marvel-monitoring-cluster-tp4054926.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e21279a2-62e9-4d08-9aed-f9d32c110da5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Thanks Boaz for your reply.

Following is the output of curl SERVER:9200/_cat/shards/?v for both nodes of our marvel cluster:

index shard prirep state docs store ip node
.marvel-2014.05.01 0 p STARTED 70 865.4kb Server-ip-1 Marvel_1
.marvel-2014.05.01 0 r STARTED 70 865kb Server-ip-2 Marvel_2

Some more things to highlight, in the Marvel Dashboard - Cluster Overview page we get following errors :

  • "Oops! FacetPhaseExecutionException[Facet [0]: (value) field [total.search.query_total] not found]" --- in the Search Request Rate panel

  • "Oops! FacetPhaseExecutionException[Facet [timestamp]: failed to find mapping for index.raw]" --- in the Indices panel

  • "Oops! FacetPhaseExecutionException[Facet [0]: (value) field [primaries.indexing.index_total] not found]" --- in the Indexing Request Rate panel

  • "Oops! FacetPhaseExecutionException[Facet [0]: (value) field [primaries.docs.count] not found]" --- in the Document Count Panel

All these apart from the SearchParseExceptions mentioned in earlier post. Also if Marvel is not storing the right data, how is it supposed to be handled?

That's the question :slight_smile: they mean that ES got a request to facet on a field
it doesn't know. In the context of marvel it typically means some kind of
data shipping issue - which means the fields are not created.

To make sure I understand correctly - the two nodes have identical output
for that command?

On Wed, Apr 30, 2014 at 1:07 PM, Mihir M mihirsm90@gmail.com wrote:

Hi

Just to add to above, lately we observe that these exceptions vanish from
the dashboard after some time and everything returns to normal.

So why do these exceptions occur at the beginning?


Regards

View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/SearchParseExceptions-in-Marvel-monitoring-cluster-tp4054926p4055089.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/51xdr5JGUrg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1398856032562-4055089.post%40n3.nabble.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKzwz0q%2Bjyub85kYsmy%2BGN%2BVdswJUUWn-TSBKXkf9Y%2B9%3DbqX3A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.