I am using Nest 6.1.0 to get Huge Json(200000 documents) from AWS elasticsearch.. I am using scroll api and in batches of 10,000
Elastic search query is optimised no issues in executing queries using kibana
However while getting data over the wire and deserialize its taking almost 20-30 seconds
client.SearchAsync(searchRequest) and then client.ScrollAsync(scrollrequest)... It takes very long time to Deserialize JSON Data...
Low level client is faster in getting response but then i would have to manually desrialize using JSONTexttReader... Which would be like writing too much of code and not sure will there be any gain.
Can you provide a typical example of a JSON document returned, and also provide a succinct but complete example of the configuration and code that you are using to scroll; please don't include any sensitive information
Have you determined that the performance impact is within NEST deserialization and not for example the network latency? I would expect the low level client to be slightly faster than the high level client, and likely Kibana is on the same VNet?
yes kibana is on same vnet... and entire search result takes 1792 ms... but when it comes to Nest it takes 20-30 seconds
i have checked each request and response in fiddler and that's quick it's not taking that long
var searchRequest= new SearchRequest(indexName)
{
Profile = false,
Size = 10000,
Query = queryContainer,
Scroll = new Time(5000),
FilterPath = source, // Only required fields (customerref,location,cart,hits,took)
Sort = new List() { new SortField() { Field = "_doc" } }
};
var searchResponses = await client.SearchAsync<T>(searchRequest); // takes time
if (searchResponses.IsValid)
{
completeResponse.Add(searchResponses);
var loopCount = 0;
var scrollsrequest = new ScrollRequest(searchResponses.ScrollId, new Time(5000));
var scrollDocsCount = searchResponses.Documents.Count;
while (scrollDocsCount < searchResponses.Total && searchResponses.IsValid)
{
loopCount++;
var scrollResponse = client.ScrollAsync<T>(scrollsrequest).Result; // takes time
completeResponse.Add(scrollResponse);
scrollDocsCount = scrollDocsCount + searchResponses.Documents.Count;
scrollsrequest.ScrollId = scrollResponse.ScrollId;
}
}
What's the performance using NEST with exactly the same query i.e. one request returning all documents?
It looks like the comparison is between one request returning all documents in Kibana and 20 sequential scroll requests (200 000 / 10 000); multiple requests are likely to be slower.
Getting entire record set will be slower and hence we are getting results using scroll API. i tried getting entire result in same query and that was worst... so had to try scroll API..
Will send you details of getting all records in single query soon so that it's comparable to kibana
I think desrialization is bottleneck as it has to desrialize around 200000 documents.
I think it would be good to enable network tracing and record the timings here, as well as profile the application. Before doing any of this however, I strongly recommend upgrading to NEST 6.2.0 which contains some performance improvements.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.