Striping--just to be clear I am not missing something

Hi Everyone,

In this great writeup regarding indexing performance @ https://www.elastic.co/guide/en/elasticsearch/guide/current/indexing-performance.html there are the following sentences regarding multiple data directories:

Use RAID 0. Striped RAID will increase disk I/O, at the obvious expense of
potential failure if a drive dies. Don’t use mirrored or parity RAIDS since
replicas provide that functionality.

Alternatively, use multiple drives and allow Elasticsearch to stripe data across
them via multiple path.data directories.

The following is probably a silly question, newbie question, but I need to ask it nonetheless. :slight_smile: if I configure an ES node with multiple data directories, does it write shards in their entirety to separate data directories, or does it spread segments of shards across multiple data directories a la RAID 0? I am guessing it's the former, but the use of the verb "...stripe data across them..." makes me thinks its possibly the latter.

Please confirm, thanks!

--John

Hi John,

we don't spread segments across paths. See docs:

The path.data settings can be set to multiple paths, in which case all paths will be used to store data (although the files belonging to a single shard will all be stored on the same data path).

Daniel

Ah, perfect, thanks Daniel! Linking this and my other question I posed to this forum yesterday (http://bit.ly/2au1TSS), if there are multiple shards for the same index on an ES node, does ES parallelize a query on that index and, if so, does ES put the shards in different data directories to maximize I/O?

Thanks

--John

Yes.

I don't believe it has something for that. IIRC it tries to balanced used disk space but I'll check.

It tries to balance disk space usage though it makes some funny assumptions.

Cool, thanks! I really appreciate your time.

--John