We just created a 2 nodes benchmarking cluster (ES 0.20.1, Ubuntu 10.04,
2.6.32-41) using a single index with 1 shard and 1 replica.
We started bulk indexing, and at about 75M documents, the seconds node with
the replica, busted with a filesystem inodes exhaustion. In fact, there
were about 7700 segments created on the replica node, while there was about
60 segments on the shard node.
All settings we pretty much to default values (in particular for the
merging parameters) except for the refresh_interval to -1.
The question is, how come the replica node ended up with so many segments?
It looks like it did not respect the index merging policy? - I know that
the performance "best practice" for bulk indexing is not to use any replica
then add replicas after bulk. But regardless of this, isn't this huge
segmentation difference between the shard and the replica a problem?