I use elasticsearch for logfile analysis. I use rolling indeces on a daily
basis. I use 2 elasticsearch-servers behind a loadbalancer. The data is
sent to the load balancer and then inserted on the according server. I use
1 index with 1 shard and 1 replica. So there is "one file" on both server.
The data is queried through the load balancer as well. And I am always
querying the current day and the same day last week.
My question now is: Would it be better to use a daily index with 2 or more
shards?
I'd add another node into the cluster to allow easier quorum and prevent
split brain.
Then split the index into (at least) 3 shards to spread the load. Ideally
you want to try to get one shard per node.
I use elasticsearch for logfile analysis. I use rolling indeces on a daily
basis. I use 2 elasticsearch-servers behind a loadbalancer. The data is
sent to the load balancer and then inserted on the according server. I use
1 index with 1 shard and 1 replica. So there is "one file" on both server.
The data is queried through the load balancer as well. And I am always
querying the current day and the same day last week.
My question now is: Would it be better to use a daily index with 2 or more
shards?
Thanks. The split brain problem aside: Is it faster for elasticsearch to
read a shard than a replica?
On Monday, February 10, 2014 11:44:41 AM UTC+1, Mark Walkom wrote:
I'd add another node into the cluster to allow easier quorum and prevent
split brain.
Then split the index into (at least) 3 shards to spread the load. Ideally
you want to try to get one shard per node.
On 10 February 2014 21:22, Valentin <ple...@gmail.com <javascript:>>wrote:
Hi,
I use elasticsearch for logfile analysis. I use rolling indeces on a
daily basis. I use 2 elasticsearch-servers behind a loadbalancer. The data
is sent to the load balancer and then inserted on the according server. I
use 1 index with 1 shard and 1 replica. So there is "one file" on both
server. The data is queried through the load balancer as well. And I am
always querying the current day and the same day last week.
My question now is: Would it be better to use a daily index with 2 or
more shards?
Your question boils down to "if there is more than one shard per node, does
it harm search or index speed"?
The answer is, it depends. If you have powerful nodes regarding CPU and RAM
and your shards are not too big, you can use more than one shard happily.
If you use too many shards per node, you will notice it when you run out of
resources.
If you plan to use more than two nodes, creating more than 1 shard per
index is very welcome so the index can keep up with the node count.
Just FYI, barring differences in the low-level segments, for all intents
and purposes, a primary shard and a corresponding replica shard should be
the same in terms of query performance.
Jorge is correct, you can think of shards as a horizontal content/resource
scaling unit. And replicas as a horizontal query performance and redundancy
scaling unit.
On Monday, February 10, 2014 11:01:01 AM UTC-5, Valentin wrote:
Thanks. The split brain problem aside: Is it faster for elasticsearch to
read a shard than a replica?
One final thought: Theoretically, if every node has only one shard and no replicas, each node could only search the data it has and no redudant data. Shouldn't that have an (small) impact on the indexing/searching of the data?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.