Failed to execute org.elasticsearch.action.search.SearchRequest error

Hey guys,

I noticed that results stopped turning up in Logstash. And when I tailed the Elasticsearch logs, I noticed this exception repeated over and over again, a total of 85 times:

    #grep "Failed to execute \[org.elasticsearch.action.search.SearchRequest@2c539ad4\] lastShard \[true\]" /var/log/elasticsearch/elasticsearch.log | wc -l
 85

I don't see these errors in the ES logs on the other two nodes. I only see it on the master node.

Here's the full output of the error:

[2015-07-11 20:23:56,239][DEBUG][action.search.type       ] [JF_ES1] All shards failed for phase: [query]
org.elasticsearch.transport.RemoteTransportException: [JF_ES2][inet[/10.10.10.26:9300]][indices:data/read/search[phase/query]]
Caused by: org.elasticsearch.search.SearchParseException: [logstash-2015.07.11][3]: from[-1],size[-1]: Parse Failure [Failed to parse source [{
  "query" : {
      "_id" : "AU56fE-yr7opIbuImhJ1"
 }
}]]
        at org.elasticsearch.search.SearchService.parseSource(SearchService.java:735)
        at org.elasticsearch.search.SearchService.createContext(SearchService.java:560)
        at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:532)
        at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:294)
        at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:776)
        at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:767)
        at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:279)
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.query.QueryParsingException: [logstash-2015.07.11] [_na] query malformed, no field after start_object
        at org.elasticsearch.index.query.QueryParseContext.parseInnerQuery(QueryParseContext.java:295)
        at org.elasticsearch.index.query.IndexQueryParserService.innerParse(IndexQueryParserService.java:382)
        at org.elasticsearch.index.query.IndexQueryParserService.parse(IndexQueryParserService.java:281)
        at org.elasticsearch.index.query.IndexQueryParserService.parse(IndexQueryParserService.java:276)
        at org.elasticsearch.search.query.QueryParseElement.parse(QueryParseElement.java:33)
        at org.elasticsearch.search.SearchService.parseSource(SearchService.java:719)
... 10 more

In taking a look through the log, I tailed back around 500 lines and examined the output. All of the errors starting with " Failed to execute [org.elasticsearch.action.search.SearchRequest@2c539ad4]" appear to have identical output, except for the fact that they are for different logstash indexes varying by date. They all appear to relate to java/groovy errors.

If I do an elasticsearch query based on the "_id" field I see in the error from that log entry I see that I do receive an error that's probably the same error that Logstash is getting when it tries to query Elasticsearch:

 #curl -XGET "http://localhost:9200/logstash-2015.07.11/_search?pretty=true" -d '{
>   "query" : {
>       "_id" : "AU56fE-yr7opIbuImhJ1"
>  }
> }'
{
  "error" : "SearchPhaseExecutionException[Failed to execute phase [query], all shards failed; shardFailures {[TxwvI4HLSPqC_E8AEBwzEQ][logstash-2015.07.11][0]: RemoteTransportException[[JF_ES3][inet[/66.147.235.108:9300]][indices:data/read/search[phase/query]]]; nested: SearchParseException[[logstash-2015.07.11][0]: from[-1],size[-1]: Parse Failure [Failed to parse source [{\n  \"query\" : {\n      \"_id\" : \"AU56fE-yr7opIbuImhJ1\"\n }\n}]]]; nested: QueryParsingException[[logstash-2015.07.11] [_na] query malformed, no field after start_object]; }{[pIegEsgDSAGF4Vgz4JZ42g][logstash-2015.07.11][1]: RemoteTransportException[[JF_ES2][inet[/10.10.10.26:9300]][indices:data/read/search[phase/query]]]; nested: SearchParseException[[logstash-2015.07.11][1]: from[-1],size[-1]: Parse Failure [Failed to parse source [{\n  \"query\" : {\n      \"_id\" : \"AU56fE-yr7opIbuImhJ1\"\n }\n}]]]; nested: QueryParsingException[[logstash-2015.07.11] [_na] query malformed, no field after start_object]; }{[8g0BOveySTyrp27oheNqsQ][logstash-2015.07.11][2]: SearchParseException[[logstash-2015.07.11][2]: from[-1],size[-1]: Parse Failure [Failed to parse source [{\n  \"query\" : {\n      \"_id\" : \"AU56fE-yr7opIbuImhJ1\"\n }\n}]]]; nested: QueryParsingException[[logstash-2015.07.11] [_na] query malformed, no field after start_object]; }{[pIegEsgDSAGF4Vgz4JZ42g][logstash-2015.07.11][3]: RemoteTransportException[[JF_ES2][inet[/10.10.10.26:9300]][indices:data/read/search[phase/query]]]; nested: SearchParseException[[logstash-2015.07.11][3]: from[-1],size[-1]: Parse Failure [Failed to parse source [{\n  \"query\" : {\n      \"_id\" : \"AU56fE-yr7opIbuImhJ1\"\n }\n}]]]; nested: QueryParsingException[[logstash-2015.07.11] [_na] query malformed, no field after start_object]; }{[8g0BOveySTyrp27oheNqsQ][logstash-2015.07.11][4]: SearchParseException[[logstash-2015.07.11][4]: from[-1],size[-1]: Parse Failure [Failed to parse source [{\n  \"query\" : {\n      \"_id\" : \"AU56fE-yr7opIbuImhJ1\"\n }\n}]]]; nested: QueryParsingException[[logstash-2015.07.11] [_na] query malformed, no field after start_object]; }]",
  "status" : 400
}

I'm also seeing these errors less frequently that are probably related to the above:

[2015-07-11 20:43:54,300][DEBUG][action.search.type       ] [JF_ES1] All shards failed for phase: [query]
org.elasticsearch.transport.RemoteTransportException: [JF_ES2][inet[/10.10.10.26:9300]][indices:data/read/search[phase/query]]
Caused by: org.elasticsearch.search.SearchParseException: [logstash-2015.07.11][3]: from[-1],size[-1]: Parse Failure [Failed to parse source [{
  "query" : {
   "_id" : "AU56fE-yr7opIbuImhJ1"
  }
}]]
        at org.elasticsearch.search.SearchService.parseSource(SearchService.java:735)
        at org.elasticsearch.search.SearchService.createContext(SearchService.java:560)
        at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:532)
        at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:294)
        at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:776)
        at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:767)
        at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:279)
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.query.QueryParsingException: [logstash-2015.07.11] [_na] query malformed, no field after start_object
        at org.elasticsearch.index.query.QueryParseContext.parseInnerQuery(QueryParseContext.java:295)
        at org.elasticsearch.index.query.IndexQueryParserService.innerParse(IndexQueryParserService.java:382)
        at org.elasticsearch.index.query.IndexQueryParserService.parse(IndexQueryParserService.java:281)
        at org.elasticsearch.index.query.IndexQueryParserService.parse(IndexQueryParserService.java:276)
        at org.elasticsearch.search.query.QueryParseElement.parse(QueryParseElement.java:33)
        at org.elasticsearch.search.SearchService.parseSource(SearchService.java:719)
        ... 10 more

And that error shares the same "_id" field as the previous error:

"_id" : "AU56fE-yr7opIbuImhJ1"

So I'm thinking that these may be related to java /groovy. And I was wondering if someone had a suggestion on how to correct this problem.

This is all I have to my elasticsearch yaml... deleting the commented out lines and empty lines from the output:

#grep -v '#' /etc/elasticsearch/elasticsearch.yml |sed '/^\s*$/d'
node.name: "JF_ES1"
node.master: true
node.data: true
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["localhost"]

My other two nodes in the cluster have this:

 #grep -v '#' /etc/elasticsearch/elasticsearch.yml |sed '/^\s*$/d'
node.name: "JF_ES2"
node.data: true
node.master: false
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["10.10.10.25"]

#grep -v '#' /etc/elasticsearch/elasticsearch.yml |sed '/^\s*$/d'
node.name: "JF_ES3"
node.master: false
node.data: true
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["10.10.10.25"]

It's not much. Pretty straight forward and basic from a configuration standpoint. I'm wondering if these errors that I'm getting are indeed Java/groovy related errors.

I'd appreciate some help in getting past this problem!

Thanks

You probably don't want to search on _id, use a direct get instead.

However that shouldn't stop Logstash at all so I doubt they are related. What do your LS logs show?

[quote="warkolm, post:2, topic:25373, full:true"]
You probably don't want to search on _id, use a direct get instead.

However that shouldn't stop Logstash at all so I doubt they are related. What do your LS logs show?
[/quote

Hmm.. ok. Well on the Logstash side of things I see this message repeated:

{:timestamp=>"2015-07-10T20:39:54.411000-0400", :message=>"retrying failed action with response code: 429", :level=>:warn}

This happens 1830 times in the logs

#grep "retrying failed action with response code: 429" /var/log/logstash/logstash.log | wc -l
1830

This is the first timestamp of when those messages started occurring:

2015-07-01T22:57:23.865000-0400

The only other errors I've seen are these:

{:timestamp=>"2015-07-01T22:52:58.248000-0400", :message=>"Exception in lumberjack input", :exception=>#<LogStash::ShutdownSignal: LogStash::ShutdownSignal>, :level=>:error}

Also I notice if I shut everything down and start everything up again, Logstash will show result for the next few hours probably. But then it'll just stop showing results yet again.

It's a little annoying. But what do those errors that I see in ES mean? And how can I get everything to run more reliably?

429 is usually a queue full response, so it sounds like you are overloading your node.
Can you add more resources to it?

Hmm ok, yeah that's probably the case!! I guess I can add some resources to the setup. But it's probably going to take some time.

The logstash host is also the first ES node. It has 6GB of Ram and 45GB of HD space. The other 2 ES nodes only have 2GB ram and 25GB HD space. I have logstash forwarder running on 25 linux nodes of varying resources.

Is there any good documentation on how to properly allocate resources to an ELK setup? Got any recommendations?

How much heap have you assigned to ES?
How much data is in your cluster?
How are you monitoring things?

How would I determine that?

There's roughly speaking about 85GB of log data among the 25 hosts. Generally about 3GB of logs per host, with one host topping out at 8GB.

With a combination of nagios and munin.

That's not much data on a large number of hosts.

Are you using default settings?

Yep... just went with the defaults.

Right, cause that won't give ES much heap, 1GB max.

Check out https://www.elastic.co/guide/en/elasticsearch/guide/current/deploy.html

Got it! I'll check that out. Thanks!