Distributors vs raid0

What is the advantage of using ElasticSearch's distributors (JBOD) over using raid0?

As far as I can tell, if I lose a drive in either case I lose the whole node until the data can be recovered. Is the distributor smart about which files it recovers and only recovers the files that were on the failed drive? Is there some other advantage I am missing?

Shaun

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4d9091bb-a472-466d-9e68-83fad57c5449%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

You will lose all data if one drive dies and you use ES striping or RAID0.

I don't know if there is a practical (throughput) difference, but logically
they are the same.

On 23 January 2015 at 10:21, Shaun Senecal senecaso@gmail.com wrote:

What is the advantage of using Elasticsearch's distributors (JBOD) over
using raid0?

As far as I can tell, if I lose a drive in either case I lose the whole
node until the data can be recovered. Is the distributor smart about which
files it recovers and only recovers the files that were on the failed
drive? Is there some other advantage I am missing?

Shaun

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4d9091bb-a472-466d-9e68-83fad57c5449%40googlegroups.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X807P06tm%2Bb7Umt_kFRfV5xNjxbtLghL3VAprG0w9SZnA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

There are no advantages for JBOD over RAID0. RAID0 is far superior when
using striped reads/writes, that is, you can add up the read/write
performance of all the physical drives when using a hardware RAID
controller. JBOD is limited to single physical drive performance .

There is only one rare case, if you want to mix physical drives with
different volume capacity, where RAID0 striping can not be applied. Then
JBOD adds up all the volumes of the drives where striped RAID0 uses the
smallest drive capacity only.

And you are correct, in either case losing a drive means failure of a
machine. ES solves node failures by replica shards on other machines, not
by a file-based repairing strategy.

Jörg

On Fri, Jan 23, 2015 at 12:38 AM, Mark Walkom markwalkom@gmail.com wrote:

You will lose all data if one drive dies and you use ES striping or RAID0.

I don't know if there is a practical (throughput) difference, but
logically they are the same.

On 23 January 2015 at 10:21, Shaun Senecal senecaso@gmail.com wrote:

What is the advantage of using Elasticsearch's distributors (JBOD) over
using raid0?

As far as I can tell, if I lose a drive in either case I lose the whole
node until the data can be recovered. Is the distributor smart about which
files it recovers and only recovers the files that were on the failed
drive? Is there some other advantage I am missing?

Shaun

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4d9091bb-a472-466d-9e68-83fad57c5449%40googlegroups.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X807P06tm%2Bb7Umt_kFRfV5xNjxbtLghL3VAprG0w9SZnA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X807P06tm%2Bb7Umt_kFRfV5xNjxbtLghL3VAprG0w9SZnA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG3tvLNqQ-iO%2B3Wdvexb34BoXB2wHoR1dcDhDB4R9x%3DHw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Thanks for the confirmations Jörg, Mark

It seems like a lot of development effort to implement this feature for
little to no gain over RAID-0, so I wonder if the folks at Elasticsearch
have bigger plans for it in the future. Perhaps file based recovery and/or
a distributor that keeps all files for a given shard together on the same
drive so that a failed drive results in the loss of only a few shards
rather than an entire node. For now though, it seems RAID is the way to go.

Shaun

On Friday, January 23, 2015 at 2:53:53 AM UTC-8, Jörg Prante wrote:

There are no advantages for JBOD over RAID0. RAID0 is far superior when
using striped reads/writes, that is, you can add up the read/write
performance of all the physical drives when using a hardware RAID
controller. JBOD is limited to single physical drive performance .

There is only one rare case, if you want to mix physical drives with
different volume capacity, where RAID0 striping can not be applied. Then
JBOD adds up all the volumes of the drives where striped RAID0 uses the
smallest drive capacity only.

And you are correct, in either case losing a drive means failure of a
machine. ES solves node failures by replica shards on other machines, not
by a file-based repairing strategy.

Jörg

On Fri, Jan 23, 2015 at 12:38 AM, Mark Walkom <markw...@gmail.com
<javascript:>> wrote:

You will lose all data if one drive dies and you use ES striping or RAID0.

I don't know if there is a practical (throughput) difference, but
logically they are the same.

On 23 January 2015 at 10:21, Shaun Senecal <sene...@gmail.com
<javascript:>> wrote:

What is the advantage of using Elasticsearch's distributors (JBOD) over
using raid0?

As far as I can tell, if I lose a drive in either case I lose the whole
node until the data can be recovered. Is the distributor smart about which
files it recovers and only recovers the files that were on the failed
drive? Is there some other advantage I am missing?

Shaun

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4d9091bb-a472-466d-9e68-83fad57c5449%40googlegroups.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X807P06tm%2Bb7Umt_kFRfV5xNjxbtLghL3VAprG0w9SZnA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X807P06tm%2Bb7Umt_kFRfV5xNjxbtLghL3VAprG0w9SZnA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b17c012e-15f4-4b64-bfbb-fcfcdda25fe1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Now that I re-read

I see the possible misconception. "RAID0" in the text should give the
picture that an ES data directory should be seen as a logical drive which
contains many files spread over physical drives. RAID0 in striping mode on
hardware controllers works differently from this: each word of block data
is split into bits that are read/written simultaneously to different
physical drives, where the filesystem or free space considerations has
nothing to do with RAID.

The ES store distributor was implemented to handle the situation where data
dirs on a node may have different free storage capacity. With the setting
"least_used" (which s the default, it really means "most_free"), ES selects
the mount point for new files that has the most free space first, so the
data paths are filled optimally by using all available space.

I don't think the distributor is of any value for future index recovery
strategies, it is too low level. Recovery will become more intelligent with
the advent of numbered sequences in Lucene segments, which allows
incremental recovery and replication of shards.

Jörg

On Fri, Jan 23, 2015 at 4:56 PM, Shaun Senecal senecaso@gmail.com wrote:

Thanks for the confirmations Jörg, Mark

It seems like a lot of development effort to implement this feature for
little to no gain over RAID-0, so I wonder if the folks at Elasticsearch
have bigger plans for it in the future. Perhaps file based recovery and/or
a distributor that keeps all files for a given shard together on the same
drive so that a failed drive results in the loss of only a few shards
rather than an entire node. For now though, it seems RAID is the way to go.

Shaun

On Friday, January 23, 2015 at 2:53:53 AM UTC-8, Jörg Prante wrote:

There are no advantages for JBOD over RAID0. RAID0 is far superior when
using striped reads/writes, that is, you can add up the read/write
performance of all the physical drives when using a hardware RAID
controller. JBOD is limited to single physical drive performance .

There is only one rare case, if you want to mix physical drives with
different volume capacity, where RAID0 striping can not be applied. Then
JBOD adds up all the volumes of the drives where striped RAID0 uses the
smallest drive capacity only.

And you are correct, in either case losing a drive means failure of a
machine. ES solves node failures by replica shards on other machines, not
by a file-based repairing strategy.

Jörg

On Fri, Jan 23, 2015 at 12:38 AM, Mark Walkom markw...@gmail.com wrote:

You will lose all data if one drive dies and you use ES striping or
RAID0.

I don't know if there is a practical (throughput) difference, but
logically they are the same.

On 23 January 2015 at 10:21, Shaun Senecal sene...@gmail.com wrote:

What is the advantage of using Elasticsearch's distributors (JBOD) over
using raid0?

As far as I can tell, if I lose a drive in either case I lose the whole
node until the data can be recovered. Is the distributor smart about which
files it recovers and only recovers the files that were on the failed
drive? Is there some other advantage I am missing?

Shaun

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/4d9091bb-a472-466d-9e68-83fad57c5449%
40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/CAEYi1X807P06tm%2Bb7Umt_
kFRfV5xNjxbtLghL3VAprG0w9SZnA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X807P06tm%2Bb7Umt_kFRfV5xNjxbtLghL3VAprG0w9SZnA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b17c012e-15f4-4b64-bfbb-fcfcdda25fe1%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b17c012e-15f4-4b64-bfbb-fcfcdda25fe1%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHQDVn69-DXKTEZ95v%2B_wbR1g-VRrhtDUPyebesxRLMsA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

There are plans to stripe data at a complete segment level, so that if a
disk/mount dies then only the segments on that are lost.
I'm not sure if there is an ETA on that.

On 24 January 2015 at 05:00, joergprante@gmail.com joergprante@gmail.com
wrote:

Now that I re-read

Elasticsearch Platform — Find real-time answers at scale | Elastic

I see the possible misconception. "RAID0" in the text should give the
picture that an ES data directory should be seen as a logical drive which
contains many files spread over physical drives. RAID0 in striping mode on
hardware controllers works differently from this: each word of block data
is split into bits that are read/written simultaneously to different
physical drives, where the filesystem or free space considerations has
nothing to do with RAID.

The ES store distributor was implemented to handle the situation where
data dirs on a node may have different free storage capacity. With the
setting "least_used" (which s the default, it really means "most_free"), ES
selects the mount point for new files that has the most free space first,
so the data paths are filled optimally by using all available space.

I don't think the distributor is of any value for future index recovery
strategies, it is too low level. Recovery will become more intelligent with
the advent of numbered sequences in Lucene segments, which allows
incremental recovery and replication of shards.

Jörg

On Fri, Jan 23, 2015 at 4:56 PM, Shaun Senecal senecaso@gmail.com wrote:

Thanks for the confirmations Jörg, Mark

It seems like a lot of development effort to implement this feature for
little to no gain over RAID-0, so I wonder if the folks at Elasticsearch
have bigger plans for it in the future. Perhaps file based recovery and/or
a distributor that keeps all files for a given shard together on the same
drive so that a failed drive results in the loss of only a few shards
rather than an entire node. For now though, it seems RAID is the way to go.

Shaun

On Friday, January 23, 2015 at 2:53:53 AM UTC-8, Jörg Prante wrote:

There are no advantages for JBOD over RAID0. RAID0 is far superior when
using striped reads/writes, that is, you can add up the read/write
performance of all the physical drives when using a hardware RAID
controller. JBOD is limited to single physical drive performance .

There is only one rare case, if you want to mix physical drives with
different volume capacity, where RAID0 striping can not be applied. Then
JBOD adds up all the volumes of the drives where striped RAID0 uses the
smallest drive capacity only.

And you are correct, in either case losing a drive means failure of a
machine. ES solves node failures by replica shards on other machines, not
by a file-based repairing strategy.

Jörg

On Fri, Jan 23, 2015 at 12:38 AM, Mark Walkom markw...@gmail.com
wrote:

You will lose all data if one drive dies and you use ES striping or
RAID0.

I don't know if there is a practical (throughput) difference, but
logically they are the same.

On 23 January 2015 at 10:21, Shaun Senecal sene...@gmail.com wrote:

What is the advantage of using Elasticsearch's distributors (JBOD)
over using raid0?

As far as I can tell, if I lose a drive in either case I lose the
whole node until the data can be recovered. Is the distributor smart about
which files it recovers and only recovers the files that were on the failed
drive? Is there some other advantage I am missing?

Shaun

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/4d9091bb-a472-466d-9e68-83fad57c5449%
40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/CAEYi1X807P06tm%2Bb7Umt_
kFRfV5xNjxbtLghL3VAprG0w9SZnA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X807P06tm%2Bb7Umt_kFRfV5xNjxbtLghL3VAprG0w9SZnA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b17c012e-15f4-4b64-bfbb-fcfcdda25fe1%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b17c012e-15f4-4b64-bfbb-fcfcdda25fe1%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHQDVn69-DXKTEZ95v%2B_wbR1g-VRrhtDUPyebesxRLMsA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHQDVn69-DXKTEZ95v%2B_wbR1g-VRrhtDUPyebesxRLMsA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8pwjM3dE9hK6i2TdSQ30cfrR2QFZn3hy0pU7dG1z_4FA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.