Can someone explain to me why, if disk space is not an issue, I don't want
maximum replication such that every node has every shard?
It seems to me that there would be no real downside here as long as I'm not
worried about filling up a disk and my updates happen infrequently and in a
timely manner.
On Monday 17 of November 2014 12:11:15 Christopher Ambler wrote:
Can someone explain to me why, if disk space is not an issue, I don't want
maximum replication such that every node has every shard?
It seems to me that there would be no real downside here as long as I'm not
worried about filling up a disk and my updates happen infrequently and in a
timely manner.
Am I missing the obvious?
Your filesystem cache can only hold so much. If you're hitting disks often,
performance will suffer a lot.
Data needs to be stored somewhere. If I'm hitting disk often, I'm going to
do so no matter what, no matter where.
So why not have the data on all nodes?
On Monday, November 17, 2014 12:11:15 PM UTC-8, Christopher Ambler wrote:
Can someone explain to me why, if disk space is not an issue, I don't want
maximum replication such that every node has every shard?
It seems to me that there would be no real downside here as long as I'm
not worried about filling up a disk and my updates happen infrequently and
in a timely manner.
Data needs to be stored somewhere. If I'm hitting disk often, I'm going to
do so no matter what, no matter where.
So why not have the data on all nodes?
On Monday, November 17, 2014 12:11:15 PM UTC-8, Christopher Ambler wrote:
Can someone explain to me why, if disk space is not an issue, I don't
want maximum replication such that every node has every shard?
It seems to me that there would be no real downside here as long as I'm
not worried about filling up a disk and my updates happen infrequently and
in a timely manner.
On Monday 17 of November 2014 23:12:26 Christopher Ambler wrote:
I don't see how that's an issue.
Data needs to be stored somewhere. If I'm hitting disk often, I'm going to
do so no matter what, no matter where.
So why not have the data on all nodes?
Because your nodes usually only have enough memory to store a limited subset
of your index. If nodes store all shards, they have to keep parts of all
shards in memory for these to be queried efficiently. If you have enough
memory, that will usually work fine.
If there is a way to keep shards stored, but prevent elasticsearch from
querying them (or otherwise trying to read them from disk), I'd sure like to
know about it, because it would make a great way to do backups or recovery
from a "passive" node without disrupting the file system cache on active nodes
under load.
Depends on how many nodes you have of course, but if you go for a replica
on every node, your write performance will take a hit. So high throughput
logging will be difficult.
I see the point though - if you have the performance for it and if
you're not trying to log at huge rates, then there is definitely extra
security in a relpica on every node.
Is it worth it? Up to you really. We currently have 5 nodes with 2
replicas on every index. This allows us to lose any 2 machines in the
cluster and keep our heads above water. This is a reasonable state of
affairs for us.
Our set up is a bit like a Raid 5 disk array, whereas you are looking at
Raid 1. Horses for courses I guess.
D
On Monday, 17 November 2014 20:11:15 UTC, Christopher Ambler wrote:
Can someone explain to me why, if disk space is not an issue, I don't want
maximum replication such that every node has every shard?
It seems to me that there would be no real downside here as long as I'm
not worried about filling up a disk and my updates happen infrequently and
in a timely manner.
Can someone explain to me why, if disk space is not an issue, I don't want
maximum replication such that every node has every shard?
It seems to me that there would be no real downside here as long as I'm
not worried about filling up a disk and my updates happen infrequently and
in a timely manner.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.