Hi all,
for a special use-case where data loss is no problem we have an index with three shards and no replicas on our 3-server Elasticsearch cluster. We don't quite understand what happens in this case if one of the servers crashes:
- The data of the shard on the crashed server may be (partially) corrupted or even completely lost - this is no problem for us. However, how is the index as a whole (which is to say, its remaining two shards) behaving as long as the one server is down: Is the index unusable for object creation even if the new object would normally get routed to one of the two available shards? Or is it still possible to use the two available shards, and only requests that would get routed to the unavailable shard would fail?
- Now, if the crashed server is restarted after some time, and the index on creation time had the "check_on_startup: fix" option activated, will the crashed server automatically make the shard available again? Again, data loss is no problem for us even if the server decides that it has to completely empty the shard to repair it. Would it re-create the shard if had been deleted from the file system? (We seem to observe a case where the crashed server wouldn't restart properly until we deleted the data on the file system.)
- If fixing does not work like described above, an alternative for us would be to delete the index altogether in case of a server crash and newly create it. In that case, there would be three shards but for the time being just two servers. So two of the shards would be assigned to the same server. After the crashed server is restarted, how long would it take for Elasticsearch to find that it should rebalance the index and evenly distribute the three shards on the again three servers. Can this rebalancing lag be configured or otherwise influenced somehow?
And finally, a question which is connected to the ones above: If you use the field type "attachment" implemented by the Attachment plugin and have replicas, is text extraction (for example getting plain text out of a MS Word .docx) repeated for each replica? Or is text extraction done once on the primary shard and only the extracted plain text replicated. If the latter is not the case by default, is it possible to get this behavior by means of an appropriate mapping, for example by not "storing" the source but only the "attachment" field?
Thanks and best regards
Heiko