I have a case where I need to fetch a huge number of records but I know already their Id. I'm using c# Nest library. What would be the most efficient way to fetch them? The size of the records can vary from a few records to more than 1m records (ids).
I have seen the MultiGetAsync and also utilising the SearchAsync by adding multiple times with the should operation the ids and making multiple calls to the cluster.
Just to enlighten a bit more on the question, the fact that I'm using c# nest library shouldn't relate directly to the performance of the call rather the API that is called behind the scenes.
Let's say that I have 1m ids of the docs that I want to fetch. What would be the best way to fetch them? With a multiget (mget) call or rather with a search query or maybe some other way?
What would be the most efficient way?
If I would like to make the performance of the call better, would the HD performance (IO speed) be the first thing I should focus to scale higher first?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.