Hi
I would like query huge amount of data (more than 500k), to achieve this, I used scroll API and searc_after(with modified time)
in both the cases it's giving me the same records(duplicate) multiple times
I tried with both the approaches but still getting duplicates the total records count matches with total records count but getting duped records means that some of the records are got missed
so search_after can give duplicate entries while indexing/updating of documents is happening. Scroll search however uses a point in time snapshot. Is there any chance you can reproduce that behaviour reliably with a small dataset that you can share plus all the queries?
first, this is a completely volunteer driven forum, so there is no guarantee of answering, also not when pinging people directly, especially not, if they have not answered within 9h after you wrote a post. If you need support with SLAs, take a look at Elastic subscriptions.
Second, this is not an example I can reproduce locally, so it's hard to find if there is a problem with the requests.
Can you reproduce this behaviour if you do not use the javascript client, but use the dev-tools console? if you do, can you share requests and responses that you executed?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.