How to tune ES for maximum write performance


(Thomas Peuss) #1

Hello!

Which parameters should be tuned to maximize the write performance of
ES?

CU
Thomas


(Shay Banon) #2

In general, the built in settings are good to go for high write performance. You might want to increase the index.refresh_interval to reduce the frequency the index gets refreshed.

On Monday, February 6, 2012 at 10:08 AM, Thomas Peuss wrote:

Hello!

Which parameters should be tuned to maximize the write performance of
ES?

CU
Thomas


(Thomas Peuss) #3

Hello Shay!

On 7 Feb., 11:04, Shay Banon kim...@gmail.com wrote:

In general, the built in settings are good to go for high write performance. You might want to increase the index.refresh_interval to reduce the frequency the index gets refreshed.

We have set index.refresh_interval to 120s and it shows good results
with this setting. Thank you for the hint.

CU
Thomas


(Shay Banon) #4

Note that you can update it in real time using the update settings API.

On Wednesday, February 8, 2012 at 10:19 AM, Thomas Peuss wrote:

Hello Shay!

On 7 Feb., 11:04, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

In general, the built in settings are good to go for high write performance. You might want to increase the index.refresh_interval to reduce the frequency the index gets refreshed.

We have set index.refresh_interval to 120s and it shows good results
with this setting. Thank you for the hint.

CU
Thomas


(Frederic) #5

Hi Shay,

We need to have docs available for search asap, while at the same time
we index 50 docs/sec, 1KB each.

We're using the default refresh_interval, which is 1sec, right?

Would you recommend decreasing that value (if possible) having such an
indexing rate? Not sure how much it could affect system load or GC
times.

We route docs in a 6 nodes, 20 shards, 1 replica cluster. Could the
addition of more replicas affect indexing times due to the replication
of info among servers?

Thanks,

On 9 feb, 04:55, Shay Banon kim...@gmail.com wrote:

Note that you can update it in real time using the update settings API.

On Wednesday, February 8, 2012 at 10:19 AM, Thomas Peuss wrote:

Hello Shay!

On 7 Feb., 11:04, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

In general, the built in settings are good to go for high write performance. You might want to increase the index.refresh_interval to reduce the frequency the index gets refreshed.

We have set index.refresh_interval to 120s and it shows good results
with this setting. Thank you for the hint.

CU
Thomas


(Shay Banon) #6

Yes, the default refresh_interval is 1 sec. Not sure I understand the question? You can't get to 50 docs per sec indexing rate?

Only the "first" replica (the first additional copy of a shard) will affect the latency of index operation with sync indexing, since it gets replication in parallel to the replicas. Obviously, if you have more replicas, it means more indexing operations happening on the cluster as a whole.

On Thursday, February 9, 2012 at 8:10 PM, Frederic wrote:

Hi Shay,

We need to have docs available for search asap, while at the same time
we index 50 docs/sec, 1KB each.

We're using the default refresh_interval, which is 1sec, right?

Would you recommend decreasing that value (if possible) having such an
indexing rate? Not sure how much it could affect system load or GC
times.

We route docs in a 6 nodes, 20 shards, 1 replica cluster. Could the
addition of more replicas affect indexing times due to the replication
of info among servers?

Thanks,

On 9 feb, 04:55, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

Note that you can update it in real time using the update settings API.

On Wednesday, February 8, 2012 at 10:19 AM, Thomas Peuss wrote:

Hello Shay!

On 7 Feb., 11:04, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

In general, the built in settings are good to go for high write performance. You might want to increase the index.refresh_interval to reduce the frequency the index gets refreshed.

We have set index.refresh_interval to 120s and it shows good results
with this setting. Thank you for the hint.

CU
Thomas


(Frederic) #7

Sorry if the question wasn't that clear (as you see, english is not my
first language :slight_smile:

My point is, if I want docs to searchable, in worst case, sooner than
in 1 sec and set the refresh_interval as 0.5, for instance, assuming
such a value is correct, could that sensibly affect the system load?
I'm not sure how heavy or resource demanding is the refresh process
but I guess GC will certainly execute more often.

Thanks for your time and patience,
Frederic

On 12 feb, 08:28, Shay Banon kim...@gmail.com wrote:

Yes, the default refresh_interval is 1 sec. Not sure I understand the question? You can't get to 50 docs per sec indexing rate?

Only the "first" replica (the first additional copy of a shard) will affect the latency of index operation with sync indexing, since it gets replication in parallel to the replicas. Obviously, if you have more replicas, it means more indexing operations happening on the cluster as a whole.

On Thursday, February 9, 2012 at 8:10 PM, Frederic wrote:

Hi Shay,

We need to have docs available for search asap, while at the same time
we index 50 docs/sec, 1KB each.

We're using the default refresh_interval, which is 1sec, right?

Would you recommend decreasing that value (if possible) having such an
indexing rate? Not sure how much it could affect system load or GC
times.

We route docs in a 6 nodes, 20 shards, 1 replica cluster. Could the
addition of more replicas affect indexing times due to the replication
of info among servers?

Thanks,

On 9 feb, 04:55, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

Note that you can update it in real time using the update settings API.

On Wednesday, February 8, 2012 at 10:19 AM, Thomas Peuss wrote:

Hello Shay!

On 7 Feb., 11:04, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

In general, the built in settings are good to go for high write performance. You might want to increase the index.refresh_interval to reduce the frequency the index gets refreshed.

We have set index.refresh_interval to 120s and it shows good results
with this setting. Thank you for the hint.

CU
Thomas


(Otis Gospodnetić) #8

Hi Frederic,

We recently worked on improving search performance of an ElasticSearch
cluster for a client.
One of the first things we did was increase the refresh_internal and
that had a very positive impact on search performance.

Otis

Search Analytics - http://sematext.com/search-analytics/index.html

On Feb 13, 9:43 am, Frederic focampo...@gmail.com wrote:

Sorry if the question wasn't that clear (as you see, english is not my
first language :slight_smile:

My point is, if I want docs to searchable, in worst case, sooner than
in 1 sec and set the refresh_interval as 0.5, for instance, assuming
such a value is correct, could that sensibly affect the system load?
I'm not sure how heavy or resource demanding is the refresh process
but I guess GC will certainly execute more often.

Thanks for your time and patience,
Frederic

On 12 feb, 08:28, Shay Banon kim...@gmail.com wrote:

Yes, the default refresh_interval is 1 sec. Not sure I understand the question? You can't get to 50 docs per sec indexing rate?

Only the "first" replica (the first additional copy of a shard) will affect the latency of index operation with sync indexing, since it gets replication in parallel to the replicas. Obviously, if you have more replicas, it means more indexing operations happening on the cluster as a whole.

On Thursday, February 9, 2012 at 8:10 PM, Frederic wrote:

Hi Shay,

We need to have docs available for search asap, while at the same time
we index 50 docs/sec, 1KB each.

We're using the default refresh_interval, which is 1sec, right?

Would you recommend decreasing that value (if possible) having such an
indexing rate? Not sure how much it could affect system load or GC
times.

We route docs in a 6 nodes, 20 shards, 1 replica cluster. Could the
addition of more replicas affect indexing times due to the replication
of info among servers?

Thanks,

On 9 feb, 04:55, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

Note that you can update it in real time using the update settings API.

On Wednesday, February 8, 2012 at 10:19 AM, Thomas Peuss wrote:

Hello Shay!

On 7 Feb., 11:04, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

In general, the built in settings are good to go for high write performance. You might want to increase the index.refresh_interval to reduce the frequency the index gets refreshed.

We have set index.refresh_interval to 120s and it shows good results
with this setting. Thank you for the hint.

CU
Thomas


(Shay Banon) #9

Setting the refresh interval to a smaller value will cause a higher load on the cluster. If its something your cluster can handle or not is really up to what you define and test.

On Monday, February 13, 2012 at 4:43 PM, Frederic wrote:

Sorry if the question wasn't that clear (as you see, english is not my
first language :slight_smile:

My point is, if I want docs to searchable, in worst case, sooner than
in 1 sec and set the refresh_interval as 0.5, for instance, assuming
such a value is correct, could that sensibly affect the system load?
I'm not sure how heavy or resource demanding is the refresh process
but I guess GC will certainly execute more often.

Thanks for your time and patience,
Frederic

On 12 feb, 08:28, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

Yes, the default refresh_interval is 1 sec. Not sure I understand the question? You can't get to 50 docs per sec indexing rate?

Only the "first" replica (the first additional copy of a shard) will affect the latency of index operation with sync indexing, since it gets replication in parallel to the replicas. Obviously, if you have more replicas, it means more indexing operations happening on the cluster as a whole.

On Thursday, February 9, 2012 at 8:10 PM, Frederic wrote:

Hi Shay,

We need to have docs available for search asap, while at the same time
we index 50 docs/sec, 1KB each.

We're using the default refresh_interval, which is 1sec, right?

Would you recommend decreasing that value (if possible) having such an
indexing rate? Not sure how much it could affect system load or GC
times.

We route docs in a 6 nodes, 20 shards, 1 replica cluster. Could the
addition of more replicas affect indexing times due to the replication
of info among servers?

Thanks,

On 9 feb, 04:55, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

Note that you can update it in real time using the update settings API.

On Wednesday, February 8, 2012 at 10:19 AM, Thomas Peuss wrote:

Hello Shay!

On 7 Feb., 11:04, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

In general, the built in settings are good to go for high write performance. You might want to increase the index.refresh_interval to reduce the frequency the index gets refreshed.

We have set index.refresh_interval to 120s and it shows good results
with this setting. Thank you for the hint.

CU
Thomas


(Frederic) #10

Fair enough. I just wanted to double check that interval could be reduced
and what it entails. I'll test it.

Thanks Kimchy and Otis


(system) #11