Nest Deserialization

Hi All,

I am using Nest 6.1.0 to get Huge Json(200000 documents) from AWS elasticsearch.. I am using scroll api and in batches of 10,000

Elastic search query is optimised no issues in executing queries using kibana

However while getting data over the wire and deserialize its taking almost 20-30 seconds

client.SearchAsync(searchRequest) and then client.ScrollAsync(scrollrequest)... It takes very long time to Deserialize JSON Data...

Low level client is faster in getting response but then i would have to manually desrialize using JSONTexttReader... Which would be like writing too much of code and not sure will there be any gain.

is there any better way to desrialize data.

Can you provide a typical example of a JSON document returned, and also provide a succinct but complete example of the configuration and code that you are using to scroll; please don't include any sensitive information :slightly_smiling_face:

Have you determined that the performance impact is within NEST deserialization and not for example the network latency? I would expect the low level client to be slightly faster than the high level client, and likely Kibana is on the same VNet?

yes kibana is on same vnet... and entire search result takes 1792 ms... but when it comes to Nest it takes 20-30 seconds

i have checked each request and response in fiddler and that's quick it's not taking that long

var searchRequest= new SearchRequest(indexName)
{
Profile = false,
Size = 10000,
Query = queryContainer,
Scroll = new Time(5000),
FilterPath = source, // Only required fields (customerref,location,cart,hits,took)
Sort = new List() { new SortField() { Field = "_doc" } }
};

                var searchResponses = await client.SearchAsync<T>(searchRequest); // takes time


                 if (searchResponses.IsValid)
                {
                    completeResponse.Add(searchResponses);
                    var loopCount = 0;
                    var scrollsrequest = new ScrollRequest(searchResponses.ScrollId, new Time(5000));
                    var scrollDocsCount = searchResponses.Documents.Count;
                    while (scrollDocsCount < searchResponses.Total && searchResponses.IsValid)
                    {
                       
                        loopCount++;
                        var scrollResponse = client.ScrollAsync<T>(scrollsrequest).Result; // takes time
                        completeResponse.Add(scrollResponse);
                        scrollDocsCount = scrollDocsCount + searchResponses.Documents.Count;
                        scrollsrequest.ScrollId = scrollResponse.ScrollId;

                    }
                }

//search response

{
"_source": {
"customerRef": "XXXAbCDadsas",
"location": {
"lat": 122233334.455,
"lon": 3445.66666
},
"cart": [
{
"prodname": "abcd",
"qty": 1
}
]
}
}
{
_source:{.....}
_source:{.....}
}
each document desrialized to customer type

customer ref- string
location - geolocation
cart custom type

To clarify, one search request takes 1792ms in Kibana, and one search request in Nest takes 20-30 seconds?

1792 MS to search entire result set(200000) -- took time showed in Kiabana.

With Nest Search + Scroll in batches of 10,000 and desrialization of JSON to collection takes 20-30 seconds.

With Low level Client (without desrialization it goes down to 7 - 10 seconds)

Time taken to deserialize entire 200000 is between 20-30 seconds

To be clear, is Kibana returning all 200 000 documents in one request?

What's the performance using NEST with exactly the same query i.e. one request returning all documents?

It looks like the comparison is between one request returning all documents in Kibana and 20 sequential scroll requests (200 000 / 10 000); multiple requests are likely to be slower.

That's Correct

Getting entire record set will be slower and hence we are getting results using scroll API. i tried getting entire result in same query and that was worst... so had to try scroll API..

Will send you details of getting all records in single query soon so that it's comparable to kibana

I think desrialization is bottleneck as it has to desrialize around 200000 documents.

I think it would be good to enable network tracing and record the timings here, as well as profile the application. Before doing any of this however, I strongly recommend upgrading to NEST 6.2.0 which contains some performance improvements.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.