Why not have maximum replication

Christopher_Ambler_2 · November 17, 2014, 8:11pm

Can someone explain to me why, if disk space is not an issue, I don't want
maximum replication such that every node has every shard?

It seems to me that there would be no real downside here as long as I'm not
worried about filling up a disk and my updates happen infrequently and in a
timely manner.

Am I missing the obvious?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/897baef9-225e-450f-ab5d-a114ff54be9d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jure_Koren · November 17, 2014, 8:21pm

On Monday 17 of November 2014 12:11:15 Christopher Ambler wrote:

Can someone explain to me why, if disk space is not an issue, I don't want
maximum replication such that every node has every shard?

It seems to me that there would be no real downside here as long as I'm not
worried about filling up a disk and my updates happen infrequently and in a
timely manner.

Am I missing the obvious?

Your filesystem cache can only hold so much. If you're hitting disks often,
performance will suffer a lot.

Best,

--
Jure Koren

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1713599.GklGf3b9bn%40havelock.
For more options, visit https://groups.google.com/d/optout.

Christopher_Ambler_2 · November 18, 2014, 7:12am

I don't see how that's an issue.

Data needs to be stored somewhere. If I'm hitting disk often, I'm going to
do so no matter what, no matter where.

So why not have the data on all nodes?

On Monday, November 17, 2014 12:11:15 PM UTC-8, Christopher Ambler wrote:

Can someone explain to me why, if disk space is not an issue, I don't want
maximum replication such that every node has every shard?

It seems to me that there would be no real downside here as long as I'm
not worried about filling up a disk and my updates happen infrequently and
in a timely manner.

Am I missing the obvious?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6e54e3dc-c9f3-4428-977e-fc3f0e64da8d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · November 18, 2014, 7:17am

There's a difference in FS cache and then do an actual read from the FS,
the latter being a lot slower. If you have SSDs, then this might be
feasible.

But overall you're potentially wasting a lot of resources.

On 18 November 2014 18:12, Christopher Ambler const.dogberry@gmail.com
wrote:

I don't see how that's an issue.

Data needs to be stored somewhere. If I'm hitting disk often, I'm going to
do so no matter what, no matter where.

So why not have the data on all nodes?

On Monday, November 17, 2014 12:11:15 PM UTC-8, Christopher Ambler wrote:

Can someone explain to me why, if disk space is not an issue, I don't
want maximum replication such that every node has every shard?

It seems to me that there would be no real downside here as long as I'm
not worried about filling up a disk and my updates happen infrequently and
in a timely manner.

Am I missing the obvious?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6e54e3dc-c9f3-4428-977e-fc3f0e64da8d%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/6e54e3dc-c9f3-4428-977e-fc3f0e64da8d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAF3ZnZmE-nkPFoU%2BgsKFw9r7aL33%3De%2B9qfHMSOJGvmxxsTn2mA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Jure_Koren · November 18, 2014, 9:14am

On Monday 17 of November 2014 23:12:26 Christopher Ambler wrote:

I don't see how that's an issue.

Data needs to be stored somewhere. If I'm hitting disk often, I'm going to
do so no matter what, no matter where.

So why not have the data on all nodes?

Because your nodes usually only have enough memory to store a limited subset
of your index. If nodes store all shards, they have to keep parts of all
shards in memory for these to be queried efficiently. If you have enough
memory, that will usually work fine.

If there is a way to keep shards stored, but prevent elasticsearch from
querying them (or otherwise trying to read them from disk), I'd sure like to
know about it, because it would make a great way to do backups or recovery
from a "passive" node without disrupting the file system cache on active nodes
under load.

--
Jure Koren

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1799499.UjryPpiTIh%40coin.
For more options, visit https://groups.google.com/d/optout.

Duncan_Innes · November 18, 2014, 9:36am

Depends on how many nodes you have of course, but if you go for a replica
on every node, your write performance will take a hit. So high throughput
logging will be difficult.

I see the point though - if you have the performance for it and if
you're not trying to log at huge rates, then there is definitely extra
security in a relpica on every node.

Is it worth it? Up to you really. We currently have 5 nodes with 2
replicas on every index. This allows us to lose any 2 machines in the
cluster and keep our heads above water. This is a reasonable state of
affairs for us.

Our set up is a bit like a Raid 5 disk array, whereas you are looking at
Raid 1. Horses for courses I guess.

D

On Monday, 17 November 2014 20:11:15 UTC, Christopher Ambler wrote:

Can someone explain to me why, if disk space is not an issue, I don't want
maximum replication such that every node has every shard?

It seems to me that there would be no real downside here as long as I'm
not worried about filling up a disk and my updates happen infrequently and
in a timely manner.

Am I missing the obvious?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/045403f7-a4fe-44bf-b1c5-88a27d1c01f5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jprante · November 18, 2014, 9:54am

Maximum replication kills write performance.

Jörg

On Mon, Nov 17, 2014 at 9:11 PM, Christopher Ambler <
const.dogberry@gmail.com> wrote:

Can someone explain to me why, if disk space is not an issue, I don't want
maximum replication such that every node has every shard?

It seems to me that there would be no real downside here as long as I'm
not worried about filling up a disk and my updates happen infrequently and
in a timely manner.

Am I missing the obvious?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/897baef9-225e-450f-ab5d-a114ff54be9d%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/897baef9-225e-450f-ab5d-a114ff54be9d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFfnum81%2Bzfej_G4m3wQ_ZoeO5JR_kYFqjfqdAK-E9J2g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Simple persistence question Elasticsearch	2	273	July 6, 2017
Limiting size of shard data Elasticsearch	2	329	July 6, 2017
Shard allocation with Multi Index in a cluster Elasticsearch	3	331	July 6, 2017
Disk awarnes on Indexing Elasticsearch	9	376	July 6, 2017
Multiple nodes on a powerful system? Elasticsearch	5	431	July 6, 2017

Why not have maximum replication

Related topics