Has anyone tried indexing performance measurement between having more
shards and more data paths?
My question comes from my wondering if having more data paths might become
cost effective on EC2.
For example, consider two configurations(5 shards, 0 replica for
A: Standard Large instance with 5 EBS(8GB) volumes
B: 5 * Standard Small instance with 1 EBS(8GB) volume
Standard Large is exactly 4 times better and expensive as Standard Small.
But A costs less because one more EBS(4 more to be exact) costs less then
one more Standard Small instance.
In traditional RDBMS, fsync-heavy database operations benefit a LOT from
So it might have been clear that A outperforms B.
But ES does as few fsyncs(commits) as possible according to the video,
"Road to a Distributed Search Engine".
So, how do they perform?
I appreciate your thoughts and ideas.
Also, does it make sense to have shared benchmark for ES? Or already have