Nested documents performance anomaly

roiwexler · April 14, 2019, 10:29am

Hi,
we're at the process of testing Es vs Solr for its indexing speed which is very impotent to our application.
we've witnessed strange behavior that we wish to understand before using it.
when we indexed 1M docs it took about 43 seconds but when we indexed the same documents only now we've nested them as 1000 parented with 1000 child documents each, it took only 26 seconds.

we know that Lucene don't support nested documents for it has a flat object model, and we do see that in fact it does index each of the child documents as a separate document (when using the nested datatype), but when searching a nested document we can't get it alone.. we always get his parent document with is all its child documents.

do we miss something here? why does it behave like that? can we get only the document we looked for with out his part and other not related child documents?
is this a valid way to speed up our index speed?

any help will be appreciated.

dadoonet · April 14, 2019, 12:44pm

Did you look at parent/child feature?

Christian_Dahlqvist · April 14, 2019, 1:29pm

Exactly how did you do this? Did you keep the size of the bulk requests constant? If so, what sizes did you use? How many concurrent indexing threads did you use in each scenario? Did you verify that you loaded Elasticsearch to the same level in both cases by comparing monitoring data?

roiwexler · May 5, 2019, 2:35pm

Q1 - Exactly how did you do this?
A1 - nested documents and using "type": "nested"

Q2 - Did you keep the size of the bulk requests constant?
If so, what sizes did you use?
A2 - yes, about 10mb or about 1,000,000 docs

Q3 - How many concurrent indexing threads did you use in each scenario?
A3 - just one

Q4 - Did you verify that you loaded Elasticsearch to the same level in both cases by comparing monitoring data?
A4 - no. is there a way to check what is ES load level? this could help us a lot

roiwexler · May 5, 2019, 2:39pm

If I'm not mistaking this harms the query time, doesn't it?

dadoonet · May 6, 2019, 10:05am

It might. More than this it will use more memory than without the p/c feature as it needs to do joins in memory.

system · June 3, 2019, 10:05am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Nested Document performance Elasticsearch	2	704	July 6, 2017
Lucene vs Elastic Search Document Count difference and its impact on term aggregation buckets Elasticsearch	10	568	August 20, 2023
Elasticsearch Performance Issue Elasticsearch	7	568	September 4, 2020
Nested document indexing performance [How to improve] Elasticsearch	2	555	July 6, 2017
Tuning nested documents Elasticsearch	6	415	July 6, 2017

Nested documents performance anomaly

Related topics