Hi Everyone,
In this great writeup regarding indexing performance @ https://www.elastic.co/guide/en/elasticsearch/guide/current/indexing-performance.html there are the following sentences regarding multiple data directories:
Use RAID 0. Striped RAID will increase disk I/O, at the obvious expense of
potential failure if a drive dies. Don’t use mirrored or parity RAIDS since
replicas provide that functionality.
Alternatively, use multiple drives and allow Elasticsearch to stripe data across
them via multiple path.data directories.
The following is probably a silly question, newbie question, but I need to ask it nonetheless. if I configure an ES node with multiple data directories, does it write shards in their entirety to separate data directories, or does it spread segments of shards across multiple data directories a la RAID 0? I am guessing it's the former, but the use of the verb "...stripe data across them..." makes me thinks its possibly the latter.
Please confirm, thanks!
--John