Following an upgrade from 7.5.1 to 7.7.0 we've started seeing an unusual error with queries used to count the number of documents in an index.
In 7.7.0 this query now returns an error:
curl -X POST '127.0.0.1:9200/v33.tcpevent-000174/_search?pretty' -H content-type:application/json --data '{"size":0,"sort":[{"@timestamp":{"order":"desc"}},{"conn_uuid":{"order":"desc"}}],"track_total_hits":true}'
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "numHits must be > 0; please use TotalHitCountCollector if you just need the total hit count"
}
],
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
{
"shard" : 1,
"index" : "v33.tcpevent-000174",
"node" : "j07kSSNmSFueB_E9i911MQ",
"reason" : {
"type" : "illegal_argument_exception",
"reason" : "numHits must be > 0; please use TotalHitCountCollector if you just need the total hit count"
}
}
],
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "numHits must be > 0; please use TotalHitCountCollector if you just need the total hit count",
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "numHits must be > 0; please use TotalHitCountCollector if you just need the total hit count"
}
}
},
"status" : 400
}
In ES 7.5.1 this was fine and didn't cause an error.
I seem to be able to get rid of the error by specifying a non-zero size, or by changing the sort parameter: i.e. both of these searches work:
{"size":0,"sort":[{"@timestamp":{"order":"desc"}}],"track_total_hits":true}
{"size":1,"sort":[{"@timestamp":{"order":"desc"}},{"conn_uuid":{"order":"desc"}}],"track_total_hits":true}
I suspect this is related to index sort as the index is configured with index sort on {"@timestamp":{"order":"desc"}}
Changing track_total_hits
to a specific value gives the same error:
{"size":0,"sort":[{"@timestamp":{"order":"desc"}},{"conn_uuid":{"order":"desc"}}],"track_total_hits":true} # error
It looks to me like this is a bug, though I've tried to reproduce this on a fresh instance with the same mappings and some test data, but it doesn't error in the same way, so will keep looking and try and build a reproducible example.