Nested documents performance anomaly

Hi,
we're at the process of testing Es vs Solr for its indexing speed which is very impotent to our application.
we've witnessed strange behavior that we wish to understand before using it.
when we indexed 1M docs it took about 43 seconds but when we indexed the same documents only now we've nested them as 1000 parented with 1000 child documents each, it took only 26 seconds.

we know that Lucene don't support nested documents for it has a flat object model, and we do see that in fact it does index each of the child documents as a separate document (when using the nested datatype), but when searching a nested document we can't get it alone.. we always get his parent document with is all its child documents.

  1. do we miss something here? why does it behave like that? can we get only the document we looked for with out his part and other not related child documents?
  2. is this a valid way to speed up our index speed?

any help will be appreciated.

Did you look at parent/child feature?

Exactly how did you do this? Did you keep the size of the bulk requests constant? If so, what sizes did you use? How many concurrent indexing threads did you use in each scenario? Did you verify that you loaded Elasticsearch to the same level in both cases by comparing monitoring data?

Q1 - Exactly how did you do this?
A1 - nested documents and using "type": "nested"

Q2 - Did you keep the size of the bulk requests constant?
If so, what sizes did you use?
A2 - yes, about 10mb or about 1,000,000 docs

Q3 - How many concurrent indexing threads did you use in each scenario?
A3 - just one

Q4 - Did you verify that you loaded Elasticsearch to the same level in both cases by comparing monitoring data?
A4 - no. is there a way to check what is ES load level? this could help us a lot

If I'm not mistaking this harms the query time, doesn't it?

It might. More than this it will use more memory than without the p/c feature as it needs to do joins in memory.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.