Indexing is a CPU-intensive operation. If the single node has more cores available for processing, it may be able to handle the indexing load more efficiently than multiple nodes with fewer cores.
Then, if your indices have multiple shards, a single powerful node might be able to handle the indexing into these shards more efficiently than if the shards are spread across multiple less powerful nodes.
Indexing performance is not necessarily limited by CPU and/or RAM. Storage performance can often be a limiting factor and limited network performance can also play a part if there are multiple nodes in the cluster.
When you index into a single node cluster you know that all the primary shards reside on that node and there is little overhead. If you instead index into a cluster it is likely that a lot of the data need to be sent between nodes for indexing as the primary shards may be spread out and indexed data alo need to be forwarded to replica shards. This adds overhead but can also increase resilience.
You did not specify how large the difference in indexing throughput was between your two tests so it is hard to tell whether this is expected or not. It would also be useful if you could provide information about the network capacity and storage used.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.