I need to send real-time queries for getting data from the dynamic index, that can be changed anytime. I speak about millions of docs in the index.
The usage of Scroll API was useful to me, given that it stored state and worked with consistency data. But on a big amount of records - it had a very bad performance. And depend on timeout parameter, that unacceptable for my purposes. Search After API looks good, but anyway it stateless, so it can lead to data loss or duplication of records in the sample.
Is there are some way to resolve this? Aggregating/filtering queries not acceptable in my situation.
I'm using Scroll API for simulation of pagination in my back-end. So I don't know in advance when my user needs to get a new page with data. In this way, timeout of Scroll context need to be as max as possible, but I think it isn't the acceptable road of resolving my problem(for example if I set timeout = 1h)...
I see. Can't really think of an option you can have.
I'd probably give a 10 minutes timeout (or whatever) but anytime I'm detecting that the user is not going to the next page (like running another search, exiting the app, ...) I'd cancel its scroll id explicitly.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.