How to retrieve millions of records very effectively at a time and display them in grid using NEST client with elastic search?

Hello Everyone,
Currently, I am using ScrollAll Method Of NEST Client to retrieve millions of records from elastic search. Below I have mentioned Code for this:

                     var numberOfShards=3;
                     objElasticClient.ScrollAll<PubmedIndexFields>("1m", numberOfShards, s => s
                    .MaxDegreeOfParallelism(numberOfShards / 2)
                    .Search(search => search
                        .Query(
                        q => q.QueryString(c => c
                        .Boost(1.1)
                        .Query(query.ToString())
                        .Analyzer(ElasticIndexConfiguration.CustomAnalyzerName)
                        .AllowLeadingWildcard()
                         )).Size(PageSize)
                        )
                ).Wait(TimeSpan.FromMinutes(5), r =>
                {


                     var lstrecords = r.SearchResponse.Documents.ToList<PubmedIndexFields>();
                    TaskPool.AddRange(lstrecords.Select((t, i) =>
                    Task<List<IndexEntityFields>>.Factory.StartNew((obj) =>
                    {
                        tokenSource.Token.ThrowIfCancellationRequested();
                        var loc = (int)obj;
                        var lstobject = lstrecords.ElementAt(loc);
                        var indexEntityFieldses = new List<IndexEntityFields>();
                        var op = new IndexEntityOperation();
                        op.Parse_ElasticSearch(lstobject);
                        indexEntityFieldses.Add(op.Fields);
                        return indexEntityFieldses;
                    }, i, tokenSource.Token)));


                    Interlocked.Add(ref seenDocuments, r.SearchResponse.Hits.Count);
                     }

But Sometimes I did not get the proper result set.

Using the Scroll API, and the ScrollAll observable helper within NEST is an efficient method to retrieve a large volume of documents.

I would question the value of displaying millions of records to an end user in a UI; sounds like a good use case to allow the end user to search those records to filter to those most relevant?

There are a couple of questions about this code:

  1. I'm not sure why a Task<T> is kicked off to perform a conversion of the document result; there is nothing awaiting the result of these asynchronous operations in the example provided. Looks like a synchronous operation would suffice here as I would imagine this conversion to be very quick.

  2. The LINQ .Select(...) and ElementAt(...) could be replaced with a simple foreach(...) over lstrecords, or a for loop, since the collection is List<T>.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.