Failed shards with top_hits

Hi,

i'am using ES 5.5.0 in one index with default 5 shards. When i run query with following aggregation:
"attributes":{
"global":{

     },
     "aggs":{  
        "attributes_products":{  
           "filter":{  
              "bool":{  
                 "must":[  
                 	some filters
                 ]
              }
           },
           "aggs":{  
              "attributes":{  
                 "nested":{  
                    "path":"attributes"
                 },
                 "aggs":{  
                    "code":{  
                       "terms":{  
                          "field":"attributes.code",
                          "size":40
                       },
                       "aggs":{  
                          "translations":{  
                             "nested":{  
                                "path":"attributes.translated_fields"
                             },
                             "aggs":{  
                                "sk":{  
                                   "nested":{  
                                      "path":"attributes.translated_fields.sk"
                                   },
                                   "aggs":{  
                                      "value":{  
                                         "terms":{  
                                            "field":"attributes.translated_fields.sk.value"
                                            "size":40
                                         },
                                         "aggs":{  
                                            "source":{  
                                               "aggs":{  
                                                  "top_attributes":{  
                                                     "top_hits":{  
                                                        "size":1
                                                     }
                                                  },
                                                  "products":{  
                                                     "reverse_nested":{  

                                                     },
                                                     "aggs":{  
                                                        "cardinality":{  
                                                           "cardinality":{  
                                                              "field":"parent_id"
                                                           }
                                                        }
                                                     }
                                                  }
                                               },
                                               "reverse_nested":{  
                                                  "path":"attributes"
                                               }
                                            }
                                         }
                                      }
                                   }
                                }
                             }
                          }
                       }
                    }
                 }
              }
           }
        }
     }
  }

with some filters on index with about 5000 items in it. I end up with 3 failed shards and very inaccurate cardinality value. If i delete top_hits aggregation all 5 shards are success.

Is it problem of deep nested aggs, problem with just one index or problem with top_hits ?

can you show what causes the shard failure.

reason is null and type is null_pointer_exception

do you have the stacktrace for it? I need to know where it happens

I am trying to get some trace but ?error_trace=true not working and i don't find another solution. Result is still

{
  "took": 42,
  "timed_out": false,
  "_shards": {
"total": 5,
"successful": 1,
"failed": 4,
"failures": [
  {
    "shard": 0,
    "index": "shop",
    "node": "EWaVA2IoSyi0adHgxABZ1Q",
    "reason": {
      "type": "null_pointer_exception",
      "reason": null
    }
  }
]
  },
  "hits": 
  "aggregations"
}

How can i get stacktrace ? I am running query in Kibana.

ok here it is

org.elasticsearch.transport.RemoteTransportException: [EWaVA2I][127.0.0.1:9300][indices:data/read/search[phase/query]]
Caused by: java.lang.NullPointerException
        at org.elasticsearch.search.aggregations.bucket.BestBucketsDeferringCollector.prepareSelectedBuckets(BestBucketsDeferringCollector.java:160) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.search.aggregations.bucket.DeferringBucketCollector.replay(DeferringBucketCollector.java:44) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.search.aggregations.AggregatorBase.runDeferredCollections(AggregatorBase.java:206) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.search.aggregations.bucket.terms.LongTermsAggregator.buildAggregation(LongTermsAggregator.java:156) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.search.aggregations.AggregatorFactory$MultiBucketAggregatorWrapper.buildAggregation(AggregatorFactory.java:147) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.search.aggregations.bucket.BucketsAggregator.bucketAggregations(BucketsAggregator.java:116) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.search.aggregations.bucket.nested.NestedAggregator.buildAggregation(NestedAggregator.java:97) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.search.aggregations.AggregatorFactory$MultiBucketAggregatorWrapper.buildAggregation(AggregatorFactory.java:147) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.search.aggregations.bucket.BucketsAggregator.bucketAggregations(BucketsAggregator.java:116) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.search.aggregations.bucket.filter.FilterAggregator.buildAggregation(FilterAggregator.java:72) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.search.aggregations.bucket.BucketsAggregator.bucketAggregations(BucketsAggregator.java:116) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.search.aggregations.bucket.global.GlobalAggregator.buildAggregation(GlobalAggregator.java:59) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.search.aggregations.AggregationPhase.execute(AggregationPhase.java:129) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:114) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:248) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:263) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.action.search.SearchTransportService$6.messageReceived(SearchTransportService.java:330) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.action.search.SearchTransportService$6.messageReceived(SearchTransportService.java:327) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:644) [elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) [elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.5.0.jar:5.5.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]

thanks for the stacktrace, that might be indeed a bug right here I will look into it and report back

@mihalikv Thanks for reporting this. I have raised https://github.com/elastic/elasticsearch/issues/28394 in the Elasticsearch repo as this seems like its a bug and will look into it further.

In the meantime could you provide the mappings for this index (maybe in a Gist if its very long) as it may help us reproduce the failure?

Thanks for help. I am away of computer, i will provide mapping tommorow.

Thanks for reporting @mihalikv. The workaround for now is to change the sort order of your top_hits aggregation. Using:

"top_attributes":{
   "top_hits":{
      "size":1,
      "sort": ["_doc"]
   }

should work fine and return the same result. The reason is that we don't handle retrieving the _score on a global aggregation that uses nested contexts and the top_hits aggregation uses the _score as the default sort criteria.

Thanks for solution.

I don't if provide mapping here or in issue but here is mapping:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.