Performance question about shards and replicas


(Valentin Pletzer) #1

Hi,

I use elasticsearch for logfile analysis. I use rolling indeces on a daily
basis. I use 2 elasticsearch-servers behind a loadbalancer. The data is
sent to the load balancer and then inserted on the according server. I use
1 index with 1 shard and 1 replica. So there is "one file" on both server.
The data is queried through the load balancer as well. And I am always
querying the current day and the same day last week.

My question now is: Would it be better to use a daily index with 2 or more
shards?

Greetings,
Valentin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3b3fb4ff-0bc1-4b83-ac2d-d421d6f8ad06%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Mark Walkom) #2

I'd add another node into the cluster to allow easier quorum and prevent
split brain.
Then split the index into (at least) 3 shards to spread the load. Ideally
you want to try to get one shard per node.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 10 February 2014 21:22, Valentin pletzer@gmail.com wrote:

Hi,

I use elasticsearch for logfile analysis. I use rolling indeces on a daily
basis. I use 2 elasticsearch-servers behind a loadbalancer. The data is
sent to the load balancer and then inserted on the according server. I use
1 index with 1 shard and 1 replica. So there is "one file" on both server.
The data is queried through the load balancer as well. And I am always
querying the current day and the same day last week.

My question now is: Would it be better to use a daily index with 2 or more
shards?

Greetings,
Valentin

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3b3fb4ff-0bc1-4b83-ac2d-d421d6f8ad06%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624bHQYFjzn3NPVoq9y1K2ziEo9qqzuDRSW5MvPnythdxtQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Valentin Pletzer) #3

Thanks. The split brain problem aside: Is it faster for elasticsearch to
read a shard than a replica?

On Monday, February 10, 2014 11:44:41 AM UTC+1, Mark Walkom wrote:

I'd add another node into the cluster to allow easier quorum and prevent
split brain.
Then split the index into (at least) 3 shards to spread the load. Ideally
you want to try to get one shard per node.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 10 February 2014 21:22, Valentin <ple...@gmail.com <javascript:>>wrote:

Hi,

I use elasticsearch for logfile analysis. I use rolling indeces on a
daily basis. I use 2 elasticsearch-servers behind a loadbalancer. The data
is sent to the load balancer and then inserted on the according server. I
use 1 index with 1 shard and 1 replica. So there is "one file" on both
server. The data is queried through the load balancer as well. And I am
always querying the current day and the same day last week.

My question now is: Would it be better to use a daily index with 2 or
more shards?

Greetings,
Valentin

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3b3fb4ff-0bc1-4b83-ac2d-d421d6f8ad06%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2dad1568-715a-4b35-91ce-7c5c56692919%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #4

Your question boils down to "if there is more than one shard per node, does
it harm search or index speed"?

The answer is, it depends. If you have powerful nodes regarding CPU and RAM
and your shards are not too big, you can use more than one shard happily.
If you use too many shards per node, you will notice it when you run out of
resources.

If you plan to use more than two nodes, creating more than 1 shard per
index is very welcome so the index can keep up with the node count.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEX6XwFUN6xcm7g4YorQqBYoa44p9DNxXESPkygm-iGyw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Binh Ly) #5

Just FYI, barring differences in the low-level segments, for all intents
and purposes, a primary shard and a corresponding replica shard should be
the same in terms of query performance.

Jorge is correct, you can think of shards as a horizontal content/resource
scaling unit. And replicas as a horizontal query performance and redundancy
scaling unit.

On Monday, February 10, 2014 11:01:01 AM UTC-5, Valentin wrote:

Thanks. The split brain problem aside: Is it faster for elasticsearch to
read a shard than a replica?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d49b4187-bcb7-4816-a249-38bf9bc1105d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Valentin Pletzer) #6

Thanks for all the answers guys!

One final thought: Theoretically, if every node has only one shard and no replicas, each node could only search the data it has and no redudant data. Shouldn't that have an (small) impact on the indexing/searching of the data?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ff2fcb26-33c7-4d9c-8268-1ea47fda4bbd%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #7