Query about refresh interval and primary node distribution

Arjit_Gupta · July 19, 2013, 2:33am

Hi,

We were benchmarking elastic search on our production cluster and we were
experimenting on refresh interval optimal values.
We had cluster of 3 machine all 32 Gb memory and 8 core. We have given 24Gb
to ES to run.

In out bench mark we are making a store document request with 50 thread
from 2 different server. We have only 5 indices with 5 primary shard and 2
replica.
Our findings
a. When we put refresh interval value as 60sec we are getting 95% as
~300ms when storing a document and having load average on all the system as
about ~20.
b. When we put refresh interval value as 1sec we are getting 95% as ~9ms
when storing a document and having load average of 1.2.

I used to think putting more value of refresh interval would give us better
result but this is not happening in this case.

Can any one explain me why is that happening
When a node from a cluster goes down its primary shards are distributed
on other active servers in the cluster. When that comes up the shards are
distributed again to the server 'but' it doesn't have primary shards. I
want to know when would distribution of primary shard take place ?

Thanks in advance
Arjit

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Arjit_Gupta · July 19, 2013, 11:15am

bump

On Friday, July 19, 2013 8:03:49 AM UTC+5:30, Arjit Gupta wrote:

Hi,

We were benchmarking Elasticsearch on our production cluster and we were
experimenting on refresh interval optimal values.
We had cluster of 3 machine all 32 Gb memory and 8 core. We have given
24Gb to ES to run.

In out bench mark we are making a store document request with 50 thread
from 2 different server. We have only 5 indices with 5 primary shard and 2
replica.
Our findings
a. When we put refresh interval value as 60sec we are getting 95% as
~300ms when storing a document and having load average on all the system as
about ~20.
b. When we put refresh interval value as 1sec we are getting 95% as ~9ms
when storing a document and having load average of 1.2.

I used to think putting more value of refresh interval would give us
better result but this is not happening in this case.

Can any one explain me why is that happening

When a node from a cluster goes down its primary shards are distributed
on other active servers in the cluster. When that comes up the shards are
distributed again to the server 'but' it doesn't have primary shards. I
want to know when would distribution of primary shard take place ?

Thanks in advance
Arjit

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · July 19, 2013, 2:54pm

It depends on what you want to measure.

From the scenario described, I assume you ingest docs with random ID for a
while. If you select 60sec for auto refresh while constantly pushing docs,
the docs are queued and you want more work to do for the Lucene indexer at
the write/read switch, compared to 1sec interval. You don't specify what
the time ~300ms and ~9ms means, I assume they are queries, not NRT gets.
For bulk indexing, it is recommended to switch refresh rate off, and enable
it again after the bulk indexing.

I do not fully understand the question about the primary shards. The
distribution of primary shards does not contribute to performance and there
is no use to catch the nodex where the primaries are. If you want to
address certain shards for faster access, you are better off with routing

Note that 24G is an unusual high setting. If you have 32G RAM, 16G should
be fine.

Jörg

On Fri, Jul 19, 2013 at 4:33 AM, Arjit Gupta arjit292@gmail.com wrote:

Hi,

We were benchmarking Elasticsearch on our production cluster and we were
experimenting on refresh interval optimal values.
We had cluster of 3 machine all 32 Gb memory and 8 core. We have given
24Gb to ES to run.

In out bench mark we are making a store document request with 50 thread
from 2 different server. We have only 5 indices with 5 primary shard and 2
replica.
Our findings
a. When we put refresh interval value as 60sec we are getting 95% as
~300ms when storing a document and having load average on all the system as
about ~20.
b. When we put refresh interval value as 1sec we are getting 95% as ~9ms
when storing a document and having load average of 1.2.

I used to think putting more value of refresh interval would give us
better result but this is not happening in this case.

Can any one explain me why is that happening

When a node from a cluster goes down its primary shards are distributed
on other active servers in the cluster. When that comes up the shards are
distributed again to the server 'but' it doesn't have primary shards. I
want to know when would distribution of primary shard take place ?

Thanks in advance
Arjit

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
No efect refresh_interval Elasticsearch	5	487	July 6, 2017
Refresh latency Elasticsearch	7	801	November 8, 2017
Investigate high GC time when indexing Elasticsearch	18	922	September 25, 2023
Number of results per shard Elasticsearch	5	371	May 13, 2020
Consistency between multiple _search requests Elasticsearch	1	389	April 13, 2018

Query about refresh interval and primary node distribution

Related topics