How long have you been running the test? Is 10000 your batch size?
The difference between your two setups is that in the first case, data is always indexed locally, while in the second case it sometimes need to go to the other node (but the reduction still looks big to me). So what I suspect is that the second setup may look like it cannot index as fast as the first setup due to this additionnal overhead, but on the other hand if you try to max out indexing speed by sending more data in parallel, then the second setup will perform better since it has more capacity overall (in particular 2x more computing power I assume).
Yes that is the expectation .. But we couldn't attain the same performance with single data node . Configurations available both data nodes are same. What could be done to improve the performance?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.