Hi,
we're at the process of testing Es vs Solr for its indexing speed which is very impotent to our application.
we've witnessed strange behavior that we wish to understand before using it.
when we indexed 1M docs it took about 43 seconds but when we indexed the same documents only now we've nested them as 1000 parented with 1000 child documents each, it took only 26 seconds.
we know that Lucene don't support nested documents for it has a flat object model, and we do see that in fact it does index each of the child documents as a separate document (when using the nested datatype), but when searching a nested document we can't get it alone.. we always get his parent document with is all its child documents.
do we miss something here? why does it behave like that? can we get only the document we looked for with out his part and other not related child documents?
Exactly how did you do this? Did you keep the size of the bulk requests constant? If so, what sizes did you use? How many concurrent indexing threads did you use in each scenario? Did you verify that you loaded Elasticsearch to the same level in both cases by comparing monitoring data?
Q1 - Exactly how did you do this?
A1 - nested documents and using "type": "nested"
Q2 - Did you keep the size of the bulk requests constant?
If so, what sizes did you use?
A2 - yes, about 10mb or about 1,000,000 docs
Q3 - How many concurrent indexing threads did you use in each scenario?
A3 - just one
Q4 - Did you verify that you loaded Elasticsearch to the same level in both cases by comparing monitoring data?
A4 - no. is there a way to check what is ES load level? this could help us a lot
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.