We have 27K documents ingested into our engine. and we are trying to do a rest api call to gather the distinct field value across all the documents and that would be 27k id. how can gather those array of id's as a response. Currently i can get only 10k results and how to get the other 17K ids.
when i use current 11 and size 1000 it returns empty results. how can this achieved in App search.
Unfortunately 10,000 is a hard limit in App Search.
You may be able to get what you want using the Elasticsearch API. Note that by default you'll still run into a max result window of 10,000 but this can be configured in Elasticsearch using index.max_result_window. As noted in the documentation, scroll may be more efficient to get large document sets.
@Kathleen_DeRusso But if i want to compare the data between 2 engines to see if a specific fields exist in one engine and not in other engine, i need to view the total documents right ?. Are you telling there is no way i can view the whole documents in the engine via Rest Api call. In my scenario, i just want to view for some analytics purpose. so performance doesn't matter to me.
@Kathleen_DeRusso
is there any other way. please let me know. This is to compare something out of the code. we are trying to make some Rest API calls so there is no need for me to worry about performance.
This is not a use case that App Search was optimized for, so expect very slow performance and I can't guarantee there won't be timeouts or errors doing this based on your data. If you run into errors the only other option is to use the scroll/ES queries I noted above.
In Elasticsearch, including App Search, there are limitations on the number of results returned for a single query to prevent excessive resource usage. The default limit is often set to 10,000 results.
To work around this limitation and retrieve more than 10,000 results, you can consider the following options:
Pagination:
Use the size and from parameters to paginate through the results.
For example, if you want to retrieve results 10,001 to 20,000, you can set size to 10,000 and from to 10,000.
{
"size": 10000,
"from": 10000,
"query": {
// Your query here
}
}
Keep in mind that deep pagination can be resource-intensive, and performance may degrade as you go deeper.
Scroll API:
Use the Scroll API to retrieve large result sets.
The Scroll API allows you to keep a "search context" open and continue retrieving results until all documents are processed.
This is more efficient than using from for pagination.
POST /your_index/_search?scroll=5m
{
"size": 1000,
"query": {
// Your query here
}
}
After the initial request, you'll receive a scroll ID. Use this ID to retrieve the next set of results.
POST /_search/scroll
{
"scroll": "5m",
"scroll_id": "your_scroll_id"
}
Increase index.max_result_window:
Elasticsearch has a setting called index.max_result_window that controls the maximum number of results that can be retrieved in a single request.
Be cautious with this approach, as setting it too high might lead to increased memory usage.
PUT /your_index/_settings
{
"index.max_result_window": 20000
}
After adjusting this setting, you can use the regular query with a larger size parameter.
Remember to consider the performance implications of your chosen method, and choose the approach that best fits your use case and infrastructure.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.