[es/search] failed: [search_phase_execution_exception] all shards failed

Hello,

I would like to get documents from an index that contains a huge number of data ~ (1 million).
I am using ElasticSearchClient to connect and get information from Elasticsearch. I tested the solution with a small number of data and it works well. But I got an error while testing with size(105000). Do you have an idea how to solve the problem ?

Below the implementation of the connexion, the request query and the error

Please don't post pictures of text, logs or code. They are difficult to read, impossible to search and replicate (if it's code), and some people may not be even able to see them :slight_smile:

Ok thank you I take note. I updated some parameter setting on the index in order to enhance the result window.
PUT /MyIndex/_settings
{
"index" : {
"max_result_window" : 2100000
}
}

I updated the request on the code in order to get the total hits : .trackTotalHits(t->t.enabled(true))

SearchResponse response = client.search(s -> s
.index(index)
.size(1900000)
.trackTotalHits(t->t.enabled(true))
.query(QueryBuilders.matchAll().build()._toQuery())
,
MyClass.class
);

I get more documents but it's not enough comparing to the number of documents on Elasticsearch

What is the specification of your cluster? How much heap do you have assigned?

Increasing that limit will put a lot more load on the cluster and it is not clear it is able to handle this. Is there anything in the Elasticsearch logs?

You can use:

  • the size and from parameters to display by default up to 10000 records to your users. If you want to change this limit, you can change index.max_result_window setting but be aware of the consequences (ie memory).
  • the search after feature to do deep pagination.
  • the Scroll API if you want to extract a resultset to be consumed by another tool later. (Not recommended anymore)

Thank you for your response !
I tried the search after with a PIT. However, I faced two issues : time performance and total number is not enough

  1. Time performance: The query on dev tools takes 4s. On the other hand, it takes 10s in the java API.
  2. I don't get the exact number of documents on Elasticsearch index. I think the query does not consider duplicates

I can not change the parameters of the cluster. They are fixed by the architects

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.