A known working ES 9.0 query may sometimes produce a ReduceSearchPhaseException

Has anyone else noticed an occasional ReduceSearchPhaseException when
querying stored data?

I uploaded via REST into an ES 9.0 cluster (deployed on EC2 with
ec2_discovery and s3_gateway).

After PUTting a mapping for my type "biocompare/halcyon" and inserting 3
records, I was able to execute the following search from a terminal
successfully, producing good results:

$ curl -XPOST cloud:9200/biocompare/halcyon/_search -d ' { "from" : 0, "size" : 1, "sort" : ["Description.facet"], "query" : { "match_all" : { } }} ' {"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":3,"max_score":null,"hits":[{"_index":"biocompare","_type":"halcyon","_id":"50682","_score":null,"fields":{"item_name.facet":"Goat Anti-Rabbit IgG, Horseradish Peroxidase Conjugated"}}]}}

However, after inserting just a few more records, the same query produces a
ReduceSearchPhraseException, which is a bad result:

{"error":"ReduceSearchPhaseException[Failed to execute phase [query], [reduce] ]; nested: "}

How can this be? The same query against the same node seems to be broken
after inserting additional records!
Is it possible to break the query just by adding new records to the data
store?

Although each new record is acknowledged as "ok" in the JSON response, I
discover that at runtime that queries using "sort" and/or "fields" over
user-mapped stored fields will fail. This may happen after an insert (the
1st or 3001st -- it is not consistent).

However, using the default stored fields (e.g. "_id", and "_source") are
still fine and still continue to return good results.

Does this look familiar to anyone? Also, please let me know and I will
quickly provide any additional info you might need on the environment, logs,
data uploaded, etc. to help understand the nature of my concern.

Regards, Jack Key

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/A-known-working-ES-9-0-query-may-sometimes-produce-a-ReduceSearchPhaseException-tp1002862p1002862.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

Hi,

In the logs, you should see the actual failure for the search (with a
stack trace). Is there a chance the you can send it? Also, based on your
description of the problem, I am pretty sure you can recreate it simply with
2 nodes on a local machine using curl. Is there a chance that you can do it
and I will have a look at it?

-shay.banon

On Wed, Jul 28, 2010 at 11:59 PM, Jack Key joeandrewkey@gmail.com wrote:

Has anyone else noticed an occasional ReduceSearchPhaseException when
querying stored data?

I uploaded via REST into an ES 9.0 cluster (deployed on EC2 with
ec2_discovery and s3_gateway).

After PUTting a mapping for my type "biocompare/halcyon" and inserting 3
records, I was able to execute the following search from a terminal
successfully, producing good results:

$ curl -XPOST cloud:9200/biocompare/halcyon/_search -d ' { "from" : 0, "size" : 1, "sort" : ["Description.facet"], "query" : { "match_all" : { } }} '

{"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":3,"max_score":null,"hits":[{"_index":"biocompare","_type":"halcyon","_id":"50682","_score":null,"fields":{"item_name.facet":"Goat
Anti-Rabbit IgG, Horseradish Peroxidase Conjugated"}}]}}
</GOOD RESULTS>

However, after inserting just a few more records, the same query produces a
ReduceSearchPhraseException, which is a bad result:

{"error":"ReduceSearchPhaseException[Failed to execute phase [query], [reduce] ]; nested: "}

How can this be? The same query against the same node seems to be broken
after inserting additional records!
Is it possible to break the query just by adding new records to the data
store?

Although each new record is acknowledged as "ok" in the JSON response, I
discover that at runtime that queries using "sort" and/or "fields" over
user-mapped stored fields will fail. This may happen after an insert (the
1st or 3001st -- it is not consistent).

However, using the default stored fields (e.g. "_id", and "_source") are
still fine and still continue to return good results.

Does this look familiar to anyone? Also, please let me know and I will
quickly provide any additional info you might need on the environment,
logs,
data uploaded, etc. to help understand the nature of my concern.

Regards, Jack Key

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/A-known-working-ES-9-0-query-may-sometimes-produce-a-ReduceSearchPhaseException-tp1002862p1002862.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

I will try to reproduce the tests with two nodes on a local machine tomorrow morning.

A quick question first.
I have un-commented the default config/logging.yml in my last tests.

  1. action:DEBUG
  2. gateway:DEBUG
  3. index.shard.recovery:DEBUG

Using that configuration, I can catch other stack traces for other queries (ones with syntax errors for instance), but not the one that sometimes produces the ReduceSearchPhaseException. Is there any way to turn up the logging further to potentially catch the stack trace?

Regards, Jack.

These logging settings should catch the mentioned problem (actually, just
the action: DEBUG. Maybe there is a problem there as well. If you recreate
this, I will have a look at why its not outputted as well.

-shay.banon

On Fri, Jul 30, 2010 at 6:25 AM, Jack Key joeandrewkey@gmail.com wrote:

I will try to reproduce the tests with two nodes on a local machine
tomorrow
morning.

A quick question first.
I have un-commented the default config/logging.yml in my last tests.

  1. action:DEBUG
  2. gateway:DEBUG
  3. index.shard.recovery:DEBUG

Using that configuration, I can catch other stack traces for other queries
(ones with syntax errors for instance), but not the one that sometimes
produces the ReduceSearchPhaseException. Is there any way to turn up the
logging further to potentially catch the stack trace?

Regards, Jack.

View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/A-known-working-ES-9-0-query-may-sometimes-produce-a-ReduceSearchPhaseException-tp1007398p1007933.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

I think we can reproduce it. I was able to do so on a single node.

After starting elasticsearch 9.0 on a local machine (1 node), I issued the following commands to reproduce the ReduceSearchPhaseException. The exception was not logged into elasticsearch.log, but it did produce the error on the second call to "./all_options_search.sh exampletype"

I am including every shell script used, except for put_1000_items.sh, which is large (622KB). You should be able to download it from

http://docs.google.com/leaf?id=0B3oSbYu7lzqJYTU2MTIyY2EtYzk2Zi00MDBkLWFiMTEtMjU5OGRkZmI2ZGFl&hl=en

<typed_into_terminal>
./create_index.sh biocompare
./x_map_type.sh exampletype
./all_options_search.sh exampletype
./put_1000_items.sh
./all_options_search.sh exampletype
</typed_into_terminal>

<logging.yml>
rootLogger: INFO, console, file
logger:

log action execution errors for easier debugging

action: DEBUG

reduce the logging for aws, too much is logged under the default INFO

com.amazonaws: WARN

gateway

gateway: DEBUG
index.gateway: DEBUG

peer shard recovery

index.shard.recovery: DEBUG

appender:
console:
type: console
layout:
type: consolePattern
conversionPattern: "[%d{ABSOLUTE}][%-5p][%-25c] %m%n"

file:
type: dailyRollingFile
file: ${path.logs}/${cluster.name}.log
datePattern: "'.'yyyy-MM-dd"
layout:
type: pattern
conversionPattern: "[%d{ABSOLUTE}][%-5p][%-25c] %m%n"
</logging.yml>

<create_index.sh>
#!/bin/bash

cat $0 | grep -v '^#' | grep -v '^cat'

curl -XPUT localhost:9200/$1/foo/1 -d '
{
"bar" : "orly?"
}
'
</create_index.sh>

<x_map_type.sh>
#!/bin/bash

cat $0 | grep -v '^#' | grep -v '^cat'

curl -XPUT localhost:9200/biocompare/$1/_mapping -d '
{
"$1" : {
"properties" : {
"item_name" : { "type" : "multi_field", "fields" : { "default" : { "type" : "string", "store" : "no", "index" : "analyzed" }, "facet" : { "type": "string", "store" : "yes", "index" : "not_analyzed" } } },

"price" : { "type" : "multi_field", "fields" : { "default" : { "type" : "float", "store" : "no", "index" : "analyzed" }, "facet" : { "type": "string", "store" : "yes", "index" : "not_analyzed" } } },

"Description" : { "type" : "multi_field", "fields" : { "default" : { "type" : "string", "store" : "no", "index" : "analyzed" }, "facet" : { "type": "string", "store" : "yes", "index" : "not_analyzed" } } }

  }

}
}
'
</x_map_type.sh>

<all_options_search.sh>
#!/bin/bash

curl -XPOST localhost:9200/biocompare/$1/_search -d '
{
"from" : 0,
"size" : 1,
"fields" : ["_id","price.facet"],
"sort" : {
"Description.facet" : { }
},
"query" : {
"match_all" : { }
},
"facets" : {
"vendor" : {
"terms" : {
"field" : "price.facet",
"size" : 100,
"analyzer" : "none"
},
"global" : false
}
}
}
'
</all_options_search.sh>

<elasticsearch.log>
[09:53:21,422][INFO ][node ] [Black Jack Tarr] {elasticsearch/0.9.0}[8936]: initializing ...
[09:53:21,426][INFO ][plugins ] [Black Jack Tarr] loaded []
[09:53:22,714][INFO ][node ] [Black Jack Tarr] {elasticsearch/0.9.0}[8936]: initialized
[09:53:22,714][INFO ][node ] [Black Jack Tarr] {elasticsearch/0.9.0}[8936]: starting ...
[09:53:22,825][INFO ][transport ] [Black Jack Tarr] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/10.4.5.45:9300]}
[09:53:25,865][INFO ][cluster.service ] [Black Jack Tarr] new_master [Black Jack Tarr][e905d29e-2c16-4e24-84e8-05c6df20ee58][inet[/10.4.5.45:9300]], reason: zen-disco-join (elected_as_master)
[09:53:25,902][INFO ][discovery ] [Black Jack Tarr] elasticsearch/e905d29e-2c16-4e24-84e8-05c6df20ee58
[09:53:25,903][DEBUG][gateway ] [Black Jack Tarr] reading state from gateway none ...
[09:53:25,903][DEBUG][gateway ] [Black Jack Tarr] read state from gateway none, took 0s
[09:53:25,903][DEBUG][gateway ] [Black Jack Tarr] no state read from gateway
[09:53:25,904][DEBUG][gateway ] [Black Jack Tarr] writing to gateway none ...
[09:53:25,905][DEBUG][gateway ] [Black Jack Tarr] wrote to gateway none, took 0s
[09:53:25,908][INFO ][http ] [Black Jack Tarr] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/10.4.5.45:9200]}
[09:53:26,191][INFO ][jmx ] [Black Jack Tarr] bound_address {service:jmx:rmi:///jndi/rmi://:9400/jmxrmi}, publish_address {service:jmx:rmi:///jndi/rmi://10.4.5.45:9400/jmxrmi}
[09:53:26,192][INFO ][node ] [Black Jack Tarr] {elasticsearch/0.9.0}[8936]: started
[09:56:12,813][INFO ][cluster.metadata ] [Black Jack Tarr] [biocompare] creating index, cause [auto(index api)], shards [5]/[1], mappings []
[09:56:13,140][DEBUG][gateway ] [Black Jack Tarr] writing to gateway none ...
[09:56:13,140][DEBUG][gateway ] [Black Jack Tarr] wrote to gateway none, took 0s
[09:56:13,238][DEBUG][index.gateway ] [Black Jack Tarr] [biocompare][0] starting recovery from none ...
[09:56:13,288][DEBUG][index.gateway ] [Black Jack Tarr] [biocompare][1] starting recovery from none ...
[09:56:13,290][DEBUG][index.gateway ] [Black Jack Tarr] [biocompare][0] recovery completed from none, took [51ms], throttling_wait [0s]
index : recovered_files [0] with total_size [0b], took [0s], throttling_wait [0s]
: reusing_files [0] with total_size [0b]
translog : number_of_operations [0], took [0s]
[09:56:13,291][DEBUG][index.gateway ] [Black Jack Tarr] [biocompare][1] recovery completed from none, took [3ms], throttling_wait [0s]
index : recovered_files [0] with total_size [0b], took [0s], throttling_wait [0s]
: reusing_files [0] with total_size [0b]
translog : number_of_operations [0], took [0s]
[09:56:13,304][DEBUG][index.gateway ] [Black Jack Tarr] [biocompare][2] starting recovery from none ...
[09:56:13,306][DEBUG][index.gateway ] [Black Jack Tarr] [biocompare][2] recovery completed from none, took [2ms], throttling_wait [0s]
index : recovered_files [0] with total_size [0b], took [0s], throttling_wait [0s]
: reusing_files [0] with total_size [0b]
translog : number_of_operations [0], took [0s]
[09:56:13,318][DEBUG][index.gateway ] [Black Jack Tarr] [biocompare][3] starting recovery from none ...
[09:56:13,321][DEBUG][index.gateway ] [Black Jack Tarr] [biocompare][3] recovery completed from none, took [2ms], throttling_wait [0s]
index : recovered_files [0] with total_size [0b], took [0s], throttling_wait [0s]
: reusing_files [0] with total_size [0b]
translog : number_of_operations [0], took [0s]
[09:56:13,334][DEBUG][index.gateway ] [Black Jack Tarr] [biocompare][4] starting recovery from none ...
[09:56:13,338][DEBUG][index.gateway ] [Black Jack Tarr] [biocompare][4] recovery completed from none, took [4ms], throttling_wait [0s]
index : recovered_files [0] with total_size [0b], took [0s], throttling_wait [0s]
: reusing_files [0] with total_size [0b]
translog : number_of_operations [0], took [0s]
[09:56:13,452][INFO ][cluster.metadata ] [Black Jack Tarr] [biocompare] update_mapping [foo] (dynamic)
[09:56:13,452][DEBUG][gateway ] [Black Jack Tarr] writing to gateway none ...
[09:56:13,452][DEBUG][gateway ] [Black Jack Tarr] wrote to gateway none, took 0s
[09:56:34,455][INFO ][cluster.metadata ] [Black Jack Tarr] [biocompare] create_mapping [exampletype]
[09:56:34,461][DEBUG][gateway ] [Black Jack Tarr] writing to gateway none ...
[09:56:34,461][DEBUG][gateway ] [Black Jack Tarr] wrote to gateway none, took 0s
[09:58:59,178][INFO ][cluster.metadata ] [Black Jack Tarr] [biocompare] update_mapping [exampletype] (dynamic)
[09:58:59,179][DEBUG][gateway ] [Black Jack Tarr] writing to gateway none ...
[09:58:59,179][DEBUG][gateway ] [Black Jack Tarr] wrote to gateway none, took 0s
[09:58:59,199][INFO ][cluster.metadata ] [Black Jack Tarr] [biocompare] update_mapping [exampletype] (dynamic)
[09:58:59,200][DEBUG][gateway ] [Black Jack Tarr] writing to gateway none ...
[09:58:59,200][DEBUG][gateway ] [Black Jack Tarr] wrote to gateway none, took 0s
[09:59:00,237][INFO ][cluster.metadata ] [Black Jack Tarr] [biocompare] update_mapping [exampletype] (dynamic)
[09:59:00,237][DEBUG][gateway ] [Black Jack Tarr] writing to gateway none ...
[09:59:00,237][DEBUG][gateway ] [Black Jack Tarr] wrote to gateway none, took 0s
[09:59:06,395][INFO ][cluster.metadata ] [Black Jack Tarr] [biocompare] update_mapping [exampletype] (dynamic)
[09:59:06,395][DEBUG][gateway ] [Black Jack Tarr] writing to gateway none ...
[09:59:06,396][DEBUG][gateway ] [Black Jack Tarr] wrote to gateway none, took 0s
[09:59:08,296][INFO ][cluster.metadata ] [Black Jack Tarr] [biocompare] update_mapping [exampletype] (dynamic)
[09:59:08,297][DEBUG][gateway ] [Black Jack Tarr] writing to gateway none ...
[09:59:08,297][DEBUG][gateway ] [Black Jack Tarr] wrote to gateway none, took 0s
</elasticsearch.log>

Updated the link to put_1000_items.sh
http://docs.google.com/leaf?id=0B3oSbYu7lzqJYTU2MTIyY2EtYzk2Zi00MDBkLWFiMTEtMjU5OGRkZmI2ZGFl&hl=en

I think I have seen this error too. See here:
http://elasticsearch-users.115913.n3.nabble.com/script-fields-issue-tp1006690p1007001.html

==== start of excerpt
Now, "sort" : ["test1"] (test1 is date converted to millis) works (note: I
was not able to make it reverse:true or desc) but "sort" : ["test2"] still
fails with simple message:
{
"error" : "ReduceSearchPhaseException[Failed to execute phase [query],
[reduce] ]; nested: "
}
(no error in server log).
==== end

So this should be reproduced with only one small index populated with three
super simple documents. All the mentioned thread had to do something with
sorting so may be the issue is there.

Regards,
Lukas

On Fri, Jul 30, 2010 at 8:25 PM, Jack Key joeandrewkey@gmail.com wrote:

Updated the link to put_1000_items.sh

put_1000_items.sh - Google Drive

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/A-known-working-ES-9-0-query-may-sometimes-produce-a-ReduceSearchPhaseException-tp1007398p1009749.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

Hi,

I ran the example, and found the problem. The first, now the exception is
properly logged (there was a case where a failed search would not be
logged). Second, the failure happens when searching and sorting on a field
that has only null values for all the hits returned (as is the case with the
above example), I fixed it:
Search fails when sorting on a field that has no values in the hits returned · Issue #285 · elastic/elasticsearch · GitHub.

A few notes on the search you do. I think that you might find the
histogram facet a better fit than the terms facet for things like prices
(numeric fields). I plan to add a range facet as well for complete custom
"interval" definition (a set of from and to). Also, if you execute a query
just for facets, you can set the size to 0.

-shay.banon

On Fri, Jul 30, 2010 at 11:09 PM, Lukáš Vlček lukas.vlcek@gmail.com wrote:

I think I have seen this error too. See here:

http://elasticsearch-users.115913.n3.nabble.com/script-fields-issue-tp1006690p1007001.html

==== start of excerpt
Now, "sort" : ["test1"] (test1 is date converted to millis) works (note: I
was not able to make it reverse:true or desc) but "sort" : ["test2"] still
fails with simple message:
{
"error" : "ReduceSearchPhaseException[Failed to execute phase [query],
[reduce] ]; nested: "
}
(no error in server log).
==== end

So this should be reproduced with only one small index populated with three
super simple documents. All the mentioned thread had to do something with
sorting so may be the issue is there.

Regards,
Lukas

On Fri, Jul 30, 2010 at 8:25 PM, Jack Key joeandrewkey@gmail.com wrote:

Updated the link to put_1000_items.sh

put_1000_items.sh - Google Drive

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/A-known-working-ES-9-0-query-may-sometimes-produce-a-ReduceSearchPhaseException-tp1007398p1009749.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

Excellent, I am not getting the ReduceSearchPhaseException now. I was using
one of my scripts that were firing this exception (without any server log
messages) and after this commit it is working without exceptions.

Lukas

On Sat, Jul 31, 2010 at 3:45 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Hi,

I ran the example, and found the problem. The first, now the exception
is properly logged (there was a case where a failed search would not be
logged). Second, the failure happens when searching and sorting on a field
that has only null values for all the hits returned (as is the case with the
above example), I fixed it:
Search fails when sorting on a field that has no values in the hits returned · Issue #285 · elastic/elasticsearch · GitHub.

A few notes on the search you do. I think that you might find the
histogram facet a better fit than the terms facet for things like prices
(numeric fields). I plan to add a range facet as well for complete custom
"interval" definition (a set of from and to). Also, if you execute a query
just for facets, you can set the size to 0.

-shay.banon

On Fri, Jul 30, 2010 at 11:09 PM, Lukáš Vlček lukas.vlcek@gmail.comwrote:

I think I have seen this error too. See here:

http://elasticsearch-users.115913.n3.nabble.com/script-fields-issue-tp1006690p1007001.html

==== start of excerpt
Now, "sort" : ["test1"] (test1 is date converted to millis) works (note: I
was not able to make it reverse:true or desc) but "sort" : ["test2"] still
fails with simple message:
{
"error" : "ReduceSearchPhaseException[Failed to execute phase [query],
[reduce] ]; nested: "
}
(no error in server log).
==== end

So this should be reproduced with only one small index populated with
three super simple documents. All the mentioned thread had to do something
with sorting so may be the issue is there.

Regards,
Lukas

On Fri, Jul 30, 2010 at 8:25 PM, Jack Key joeandrewkey@gmail.comwrote:

Updated the link to put_1000_items.sh

put_1000_items.sh - Google Drive

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/A-known-working-ES-9-0-query-may-sometimes-produce-a-ReduceSearchPhaseException-tp1007398p1009749.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

Hiya

I ran the example, and found the problem. The first, now the
exception is properly logged (there was a case where a failed search
would not be logged). Second, the failure happens when searching and
sorting on a field that has only null values for all the hits returned
(as is the case with the above example), I fixed it:
Search fails when sorting on a field that has no values in the hits returned · Issue #285 · elastic/elasticsearch · GitHub.

Hopefully this will fix this longstanding
issue: Casting errors under load · Issue #120 · elastic/elasticsearch · GitHub

I'll let you know once I've upgraded to 0.9 in live

++Jack Key for writing a test case!

clint

--
Web Announcements Limited is a company registered in England and Wales,
with company number 05608868, with registered address at 10 Arvon Road,
London, N5 1PR.

It seems to work correctly after repeating the same test on the newest snapshot.

Thanks, everyone!

It seems that this bug has reappeared for ES 0.10.0.
If you apply the same steps in the use case above on the ES 0.10.0 distribution, you will get a NullPointerException (instead of the original ReduceSearchPhase Exception), Also, as before this exception will fail to log.

./all_options_search.sh exampletype
{"error":"NullPointerException[null]"}
</example output>

Regards, Jack Key

Yea, you are right. Just pushed a fix, it now happened on a higher layer
when building the REST response and handling null sort values (they are
returned now for each hit).

-shay.banon

On Tue, Sep 21, 2010 at 9:45 PM, Jack Key joeandrewkey@gmail.com wrote:

It seems that this bug has reappeared for ES 0.10.0.
If you apply the same steps in the use case above on the ES 0.10.0
distribution, you will get a NullPointerException (instead of the original
ReduceSearchPhase Exception), Also, as before this exception will fail to
log.

> ./all_options_search.sh exampletype {"error":"NullPointerException[null]"}

Regards, Jack Key

View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/A-known-working-ES-9-0-query-may-sometimes-produce-a-ReduceSearchPhaseException-tp1007398p1557072.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.