Periodic temporary cluster slowdown/freeze during long index process

vpunski · January 4, 2011, 8:22am

I have 9 nodes ES cluster, Q9400 CPU, 8G RAM, SATA2 discs, 4G heap
size.
ES runs in embedded mode, with small web logic wrapping it.
All 9 nodes are load balanced.
GC tuned properly, no long tenured collections at all.

We have relatively simple document mapping with 5 fields, and one
object with dynamic fields, 40 million entries total to be indexed.

We get pretty slow index rate of 100 index ops/second for each node,
with response time ~50-200 msec (sync mode)

Once a minute (from 30 seconds up to 2 minutes) we experiencing
strange behaviour of "global" freeze during 10-20 seconds, and insert
times increased up to 10 seconds each (our time out value is 10
seconds). The freeze is not absolute, there are some requests
processed during the freeze, but overall system performance is
dropped.

Our current monitoring method is htop, bmon, vmstat, iostat, lsof.

During normal behaviour period, our CPU is about 50%, IO utilization
about 10%, about 4G of ram for file caching, about 800KB/sec RX/TX
network.

During the freeze period, CPU almost 0% on all nodes (!!!), almost the
same IO utilization, no changes in network traffic.

My first idea is Lucen's merge process, that should run in background
without influencing overall system performance.
From performance measurements provided by other users on this mail
list it seems that 100 tps is not too high value.

I have two questions:

What is the reason of temporary slowdown, and how to investigate
it.
What is the reason for slow performance (on my opinion). Is it
possible to get 1000 tps for node, sustained rate.

Please advice

Thanks,
Vadim

vpunski · January 4, 2011, 8:49am

My mistake:
IO utilization during freeze period is 100%. A lot of activity in data
directory of ES due to strace logs.

On Jan 4, 10:22 am, vadim vpun...@gmail.com wrote:

I have 9 nodes ES cluster, Q9400 CPU, 8G RAM, SATA2 discs, 4G heap
size.
ES runs in embedded mode, with small web logic wrapping it.
All 9 nodes are load balanced.
GC tuned properly, no long tenured collections at all.

We have relatively simple document mapping with 5 fields, and one
object with dynamic fields, 40 million entries total to be indexed.

We get pretty slow index rate of 100 index ops/second for each node,
with response time ~50-200 msec (sync mode)

Once a minute (from 30 seconds up to 2 minutes) we experiencing
strange behaviour of "global" freeze during 10-20 seconds, and insert
times increased up to 10 seconds each (our time out value is 10
seconds). The freeze is not absolute, there are some requests
processed during the freeze, but overall system performance is
dropped.

Our current monitoring method is htop, bmon, vmstat, iostat, lsof.

During normal behaviour period, our CPU is about 50%, IO utilization
about 10%, about 4G of ram for file caching, about 800KB/sec RX/TX
network.

During the freeze period, CPU almost 0% on all nodes (!!!), almost the
same IO utilization, no changes in network traffic.

My first idea is Lucen's merge process, that should run in background
without influencing overall system performance.
From performance measurements provided by other users on this mail
list it seems that 100 tps is not too high value.

I have two questions:

What is the reason of temporary slowdown, and how to investigate
it.

What is the reason for slow performance (on my opinion). Is it
possible to get 1000 tps for node, sustained rate.

Please advice

Thanks,
Vadim

kimchy · January 5, 2011, 12:00pm

Which version of ES are you using?

What is strace logs? Do you mean exceptions?

Regarding the indexing speed in general, it depends on several factors, but
mainly driven by the size / complexity of the docs you index. There are ways
to optimize that, if you can post your current settings, I can suggest some.

Regarding the freeze, my immediate thought is the flush process (on a shard
level) that happens once there are 5000 (default value) entries in the
transaction log. This basically calls Lucene commit, clears the transaction
log, and continues. Based on your description, it should be simple to write
a simple standalone test case that recreates it (a simple Main class) with a
single embedded ES node, I can profile it and see where time is spent.

On Tue, Jan 4, 2011 at 10:49 AM, vadim vpunski@gmail.com wrote:

My mistake:
IO utilization during freeze period is 100%. A lot of activity in data
directory of ES due to strace logs.

On Jan 4, 10:22 am, vadim vpun...@gmail.com wrote:

I have 9 nodes ES cluster, Q9400 CPU, 8G RAM, SATA2 discs, 4G heap
size.
ES runs in embedded mode, with small web logic wrapping it.
All 9 nodes are load balanced.
GC tuned properly, no long tenured collections at all.

We have relatively simple document mapping with 5 fields, and one
object with dynamic fields, 40 million entries total to be indexed.

We get pretty slow index rate of 100 index ops/second for each node,
with response time ~50-200 msec (sync mode)

Once a minute (from 30 seconds up to 2 minutes) we experiencing
strange behaviour of "global" freeze during 10-20 seconds, and insert
times increased up to 10 seconds each (our time out value is 10
seconds). The freeze is not absolute, there are some requests
processed during the freeze, but overall system performance is
dropped.

Our current monitoring method is htop, bmon, vmstat, iostat, lsof.

During normal behaviour period, our CPU is about 50%, IO utilization
about 10%, about 4G of ram for file caching, about 800KB/sec RX/TX
network.

During the freeze period, CPU almost 0% on all nodes (!!!), almost the
same IO utilization, no changes in network traffic.

My first idea is Lucen's merge process, that should run in background
without influencing overall system performance.
From performance measurements provided by other users on this mail
list it seems that 100 tps is not too high value.

I have two questions:

What is the reason of temporary slowdown, and how to investigate
it.

What is the reason for slow performance (on my opinion). Is it
possible to get 1000 tps for node, sustained rate.

Please advice

Thanks,
Vadim

vpunski · January 5, 2011, 12:11pm

by strace logs I mean unix strace utility, showing file system activity:
strace -p PROCESS_ID -f -t -q -T -e trace=file -o strace.log
that full of elasticsearch/data/... activity (open,stat, lstat, unlink...)

My mapping:

{
"my_object": {
"properties": {
"creation_date": {
"type": "date",
"store": "yes",
"omit_term_freq_and_positions" : true,
"index" : "not_analyzed",
"omit_norms" : true,
"boost" : 1.0,
"term_vector" : "no"
},
"modification_date": {
"type": "date",
"store": "yes",
"omit_term_freq_and_positions" : true,
"index" : "not_analyzed",
"omit_norms" : true,
"boost" : 1.0,
"term_vector" : "no"
},
"key": {
"type": "string",
"store": "yes",
"index": "not_analyzed",
"omit_term_freq_and_positions" : true,
"omit_norms" : true,
"boost" : 1.0,
"term_vector" : "no"
},
"parent": {
"type": "string",
"store": "yes",
"index": "not_analyzed",
"omit_term_freq_and_positions" : true,
"omit_norms" : true,
"boost" : 1.0,
"term_vector" : "no"
},
"type": {
"type": "string",
"store": "yes",
"index": "not_analyzed",
"omit_term_freq_and_positions" : true,
"omit_norms" : true,
"boost" : 1.0,
"term_vector" : "no"
}
}
}
}

In addition to the mapping used, there are about 10 dynamic text fields in
my_object indexed, about 20 characters length each (simple text).

My elasticsearch.yml:

cluster.name : CMWELL_INDEX

path:
home: data/elasticsearch
logs: data/elasticsearch/logs

gateway:
type: fs
recover_after_nodes: 9
recover_after_time: 2m
expected_nodes: 9
fs:
location: data/elasticsearch/snapshot

index:
gateway:
type: fs
snapshot_interval : 10s
store:
type: niofs
number_of_shards : 18
number_of_replicas : 1

On Wed, Jan 5, 2011 at 2:00 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Which version of ES are you using?

What is strace logs? Do you mean exceptions?

Regarding the indexing speed in general, it depends on several factors, but
mainly driven by the size / complexity of the docs you index. There are ways
to optimize that, if you can post your current settings, I can suggest some.

Regarding the freeze, my immediate thought is the flush process (on a shard
level) that happens once there are 5000 (default value) entries in the
transaction log. This basically calls Lucene commit, clears the transaction
log, and continues. Based on your description, it should be simple to write
a simple standalone test case that recreates it (a simple Main class) with a
single embedded ES node, I can profile it and see where time is spent.

On Tue, Jan 4, 2011 at 10:49 AM, vadim vpunski@gmail.com wrote:

My mistake:
IO utilization during freeze period is 100%. A lot of activity in data
directory of ES due to strace logs.

On Jan 4, 10:22 am, vadim vpun...@gmail.com wrote:

I have 9 nodes ES cluster, Q9400 CPU, 8G RAM, SATA2 discs, 4G heap
size.
ES runs in embedded mode, with small web logic wrapping it.
All 9 nodes are load balanced.
GC tuned properly, no long tenured collections at all.

We have relatively simple document mapping with 5 fields, and one
object with dynamic fields, 40 million entries total to be indexed.

We get pretty slow index rate of 100 index ops/second for each node,
with response time ~50-200 msec (sync mode)

Once a minute (from 30 seconds up to 2 minutes) we experiencing
strange behaviour of "global" freeze during 10-20 seconds, and insert
times increased up to 10 seconds each (our time out value is 10
seconds). The freeze is not absolute, there are some requests
processed during the freeze, but overall system performance is
dropped.

Our current monitoring method is htop, bmon, vmstat, iostat, lsof.

During normal behaviour period, our CPU is about 50%, IO utilization
about 10%, about 4G of ram for file caching, about 800KB/sec RX/TX
network.

During the freeze period, CPU almost 0% on all nodes (!!!), almost the
same IO utilization, no changes in network traffic.

My first idea is Lucen's merge process, that should run in background
without influencing overall system performance.
From performance measurements provided by other users on this mail
list it seems that 100 tps is not too high value.

I have two questions:

What is the reason of temporary slowdown, and how to investigate
it.

What is the reason for slow performance (on my opinion). Is it
possible to get 1000 tps for node, sustained rate.

Please advice

Thanks,
Vadim

kimchy · January 5, 2011, 12:35pm

First, you set a shared fs gateway, and it points locally to each node. The
idea of the shared gateway is to point to a shared file system. You can
simply remove it (the gateway.type and the index gateway setting) and use
local gateway, see if that helps.

Second, which version of ES are you using?

On Wed, Jan 5, 2011 at 2:11 PM, Vadim Punski vpunski@gmail.com wrote:

by strace logs I mean unix strace utility, showing file system activity:
strace -p PROCESS_ID -f -t -q -T -e trace=file -o strace.log
that full of elasticsearch/data/... activity (open,stat, lstat, unlink...)

My mapping:

{
"my_object": {
"properties": {
"creation_date": {
"type": "date",
"store": "yes",
"omit_term_freq_and_positions" : true,
"index" : "not_analyzed",
"omit_norms" : true,
"boost" : 1.0,
"term_vector" : "no"
},
"modification_date": {
"type": "date",
"store": "yes",
"omit_term_freq_and_positions" : true,
"index" : "not_analyzed",
"omit_norms" : true,
"boost" : 1.0,
"term_vector" : "no"
},
"key": {
"type": "string",
"store": "yes",
"index": "not_analyzed",
"omit_term_freq_and_positions" : true,
"omit_norms" : true,
"boost" : 1.0,
"term_vector" : "no"
},
"parent": {
"type": "string",
"store": "yes",
"index": "not_analyzed",
"omit_term_freq_and_positions" : true,
"omit_norms" : true,
"boost" : 1.0,
"term_vector" : "no"
},
"type": {
"type": "string",
"store": "yes",
"index": "not_analyzed",
"omit_term_freq_and_positions" : true,
"omit_norms" : true,
"boost" : 1.0,
"term_vector" : "no"
}
}
}
}

In addition to the mapping used, there are about 10 dynamic text fields in
my_object indexed, about 20 characters length each (simple text).

My elasticsearch.yml:

cluster.name : CMWELL_INDEX

path:
home: data/elasticsearch
logs: data/elasticsearch/logs

gateway:
type: fs
recover_after_nodes: 9
recover_after_time: 2m
expected_nodes: 9
fs:
location: data/elasticsearch/snapshot

index:
gateway:
type: fs
snapshot_interval : 10s
store:
type: niofs
number_of_shards : 18
number_of_replicas : 1

On Wed, Jan 5, 2011 at 2:00 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Which version of ES are you using?

What is strace logs? Do you mean exceptions?

Regarding the indexing speed in general, it depends on several factors,
but mainly driven by the size / complexity of the docs you index. There are
ways to optimize that, if you can post your current settings, I can suggest
some.

Regarding the freeze, my immediate thought is the flush process (on a
shard level) that happens once there are 5000 (default value) entries in the
transaction log. This basically calls Lucene commit, clears the transaction
log, and continues. Based on your description, it should be simple to write
a simple standalone test case that recreates it (a simple Main class) with a
single embedded ES node, I can profile it and see where time is spent.

On Tue, Jan 4, 2011 at 10:49 AM, vadim vpunski@gmail.com wrote:

My mistake:
IO utilization during freeze period is 100%. A lot of activity in data
directory of ES due to strace logs.

On Jan 4, 10:22 am, vadim vpun...@gmail.com wrote:

I have 9 nodes ES cluster, Q9400 CPU, 8G RAM, SATA2 discs, 4G heap
size.
ES runs in embedded mode, with small web logic wrapping it.
All 9 nodes are load balanced.
GC tuned properly, no long tenured collections at all.

We have relatively simple document mapping with 5 fields, and one
object with dynamic fields, 40 million entries total to be indexed.

We get pretty slow index rate of 100 index ops/second for each node,
with response time ~50-200 msec (sync mode)

Once a minute (from 30 seconds up to 2 minutes) we experiencing
strange behaviour of "global" freeze during 10-20 seconds, and insert
times increased up to 10 seconds each (our time out value is 10
seconds). The freeze is not absolute, there are some requests
processed during the freeze, but overall system performance is
dropped.

Our current monitoring method is htop, bmon, vmstat, iostat, lsof.

During normal behaviour period, our CPU is about 50%, IO utilization
about 10%, about 4G of ram for file caching, about 800KB/sec RX/TX
network.

During the freeze period, CPU almost 0% on all nodes (!!!), almost the
same IO utilization, no changes in network traffic.

My first idea is Lucen's merge process, that should run in background
without influencing overall system performance.
From performance measurements provided by other users on this mail
list it seems that 100 tps is not too high value.

I have two questions:

What is the reason of temporary slowdown, and how to investigate
it.

What is the reason for slow performance (on my opinion). Is it
possible to get 1000 tps for node, sustained rate.

Please advice

Thanks,
Vadim

vpunski · January 5, 2011, 1:48pm

I totally confused ... What changes should I perform?
I need a file based index, as it will not fit into the RAM.
I need the index to survive full cluster restart.

On Wed, Jan 5, 2011 at 2:35 PM, Shay Banon shay.banon@elasticsearch.comwrote:

First, you set a shared fs gateway, and it points locally to each node. The
idea of the shared gateway is to point to a shared file system. You can
simply remove it (the gateway.type and the index gateway setting) and use
local gateway, see if that helps.

Second, which version of ES are you using?

On Wed, Jan 5, 2011 at 2:11 PM, Vadim Punski vpunski@gmail.com wrote:

by strace logs I mean unix strace utility, showing file system activity:
strace -p PROCESS_ID -f -t -q -T -e trace=file -o strace.log
that full of elasticsearch/data/... activity (open,stat, lstat, unlink...)

My mapping:

{
"my_object": {
"properties": {
"creation_date": {
"type": "date",
"store": "yes",
"omit_term_freq_and_positions" : true,
"index" : "not_analyzed",
"omit_norms" : true,
"boost" : 1.0,
"term_vector" : "no"
},
"modification_date": {
"type": "date",
"store": "yes",
"omit_term_freq_and_positions" : true,
"index" : "not_analyzed",
"omit_norms" : true,
"boost" : 1.0,
"term_vector" : "no"
},
"key": {
"type": "string",
"store": "yes",
"index": "not_analyzed",
"omit_term_freq_and_positions" : true,
"omit_norms" : true,
"boost" : 1.0,
"term_vector" : "no"
},
"parent": {
"type": "string",
"store": "yes",
"index": "not_analyzed",
"omit_term_freq_and_positions" : true,
"omit_norms" : true,
"boost" : 1.0,
"term_vector" : "no"
},
"type": {
"type": "string",
"store": "yes",
"index": "not_analyzed",
"omit_term_freq_and_positions" : true,
"omit_norms" : true,
"boost" : 1.0,
"term_vector" : "no"
}
}
}
}

In addition to the mapping used, there are about 10 dynamic text fields in
my_object indexed, about 20 characters length each (simple text).

My elasticsearch.yml:

cluster.name : CMWELL_INDEX

path:
home: data/elasticsearch
logs: data/elasticsearch/logs

gateway:
type: fs
recover_after_nodes: 9
recover_after_time: 2m
expected_nodes: 9
fs:
location: data/elasticsearch/snapshot

index:
gateway:
type: fs
snapshot_interval : 10s
store:
type: niofs
number_of_shards : 18
number_of_replicas : 1

On Wed, Jan 5, 2011 at 2:00 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Which version of ES are you using?

What is strace logs? Do you mean exceptions?

Regarding the indexing speed in general, it depends on several factors,
but mainly driven by the size / complexity of the docs you index. There are
ways to optimize that, if you can post your current settings, I can suggest
some.

Regarding the freeze, my immediate thought is the flush process (on a
shard level) that happens once there are 5000 (default value) entries in the
transaction log. This basically calls Lucene commit, clears the transaction
log, and continues. Based on your description, it should be simple to write
a simple standalone test case that recreates it (a simple Main class) with a
single embedded ES node, I can profile it and see where time is spent.

On Tue, Jan 4, 2011 at 10:49 AM, vadim vpunski@gmail.com wrote:

My mistake:
IO utilization during freeze period is 100%. A lot of activity in data
directory of ES due to strace logs.

On Jan 4, 10:22 am, vadim vpun...@gmail.com wrote:

I have 9 nodes ES cluster, Q9400 CPU, 8G RAM, SATA2 discs, 4G heap
size.
ES runs in embedded mode, with small web logic wrapping it.
All 9 nodes are load balanced.
GC tuned properly, no long tenured collections at all.

We have relatively simple document mapping with 5 fields, and one
object with dynamic fields, 40 million entries total to be indexed.

We get pretty slow index rate of 100 index ops/second for each node,
with response time ~50-200 msec (sync mode)

Once a minute (from 30 seconds up to 2 minutes) we experiencing
strange behaviour of "global" freeze during 10-20 seconds, and insert
times increased up to 10 seconds each (our time out value is 10
seconds). The freeze is not absolute, there are some requests
processed during the freeze, but overall system performance is
dropped.

Our current monitoring method is htop, bmon, vmstat, iostat, lsof.

During normal behaviour period, our CPU is about 50%, IO utilization
about 10%, about 4G of ram for file caching, about 800KB/sec RX/TX
network.

During the freeze period, CPU almost 0% on all nodes (!!!), almost the
same IO utilization, no changes in network traffic.

My first idea is Lucen's merge process, that should run in background
without influencing overall system performance.
From performance measurements provided by other users on this mail
list it seems that 100 tps is not too high value.

I have two questions:

What is the reason of temporary slowdown, and how to investigate
it.

What is the reason for slow performance (on my opinion). Is it
possible to get 1000 tps for node, sustained rate.

Please advice

Thanks,
Vadim

Clinton_Gormley · January 5, 2011, 1:56pm

On Wed, 2011-01-05 at 15:48 +0200, Vadim Punski wrote:

I totally confused ... What changes should I perform?
I need a file based index, as it will not fit into the RAM.
I need the index to survive full cluster restart.

By default, Elasticsearch uses the 'local' gateway, which is a file
based persistent gateway, but sits on each node instead of in a single
shared filesystem.

The local gateway doubles as the work directory for each node.

It is a good deal more efficient than the shared gateway

clint

vpunski · January 5, 2011, 2:22pm

No changes using local storage (see below)...
I think I've missed something in this configuration, as only
"elasticsearch/data" directory created.
Does this configuration implies "time machine" functionality and cluster
full restart?

cluster.name : MY_INDEX

path:
home: data/elasticsearch
logs: data/elasticsearch/logs

gateway:
recover_after_nodes: 9
recover_after_time: 2m
expected_nodes: 9
fs:
location: data/elasticsearch/snapshot

What I see is that global slowdown for about 5 seconds occurs, and translog
file of size 2.4MB goes to 0 size.

-rw-r--r-- 1 2.4M 2011-01-05 16:14 translog-1294236579592

But how working on 2.4MB transaction file may influence the system so much?
Seems to me like a very strange behaviour?

Can you explain this?

Thanks

On Wed, Jan 5, 2011 at 3:56 PM, Clinton Gormley clinton@iannounce.co.ukwrote:

On Wed, 2011-01-05 at 15:48 +0200, Vadim Punski wrote:

I totally confused ... What changes should I perform?
I need a file based index, as it will not fit into the RAM.
I need the index to survive full cluster restart.

By default, Elasticsearch uses the 'local' gateway, which is a file
based persistent gateway, but sits on each node instead of in a single
shared filesystem.

The local gateway doubles as the work directory for each node.

It is a good deal more efficient than the shared gateway

clint

vpunski · January 6, 2011, 7:22am

leaving the indexing process over the night shows the same results of
periodic slowdown.
Any ideas?

Thanks

On Wed, Jan 5, 2011 at 4:22 PM, Vadim Punski vpunski@gmail.com wrote:

No changes using local storage (see below)...
I think I've missed something in this configuration, as only
"elasticsearch/data" directory created.
Does this configuration implies "time machine" functionality and cluster
full restart?

cluster.name : MY_INDEX

path:
home: data/elasticsearch
logs: data/elasticsearch/logs

gateway:
recover_after_nodes: 9
recover_after_time: 2m
expected_nodes: 9
fs:
location: data/elasticsearch/snapshot

What I see is that global slowdown for about 5 seconds occurs, and translog
file of size 2.4MB goes to 0 size.

-rw-r--r-- 1 2.4M 2011-01-05 16:14 translog-1294236579592

But how working on 2.4MB transaction file may influence the system so much?
Seems to me like a very strange behaviour?

Can you explain this?

Thanks

On Wed, Jan 5, 2011 at 3:56 PM, Clinton Gormley clinton@iannounce.co.ukwrote:

On Wed, 2011-01-05 at 15:48 +0200, Vadim Punski wrote:

I totally confused ... What changes should I perform?
I need a file based index, as it will not fit into the RAM.
I need the index to survive full cluster restart.

By default, Elasticsearch uses the 'local' gateway, which is a file
based persistent gateway, but sits on each node instead of in a single
shared filesystem.

The local gateway doubles as the work directory for each node.

It is a good deal more efficient than the shared gateway

clint

vpunski · January 6, 2011, 8:27am

From my short investigation over ES+Lucene code:

Javadoc of org.apache.lucene.index.ConcurrentMergeScheduler

A MergeSchedulerhttp://lucene.apache.org/java/2_4_1/api/org/apache/lucene/index/MergeScheduler.htmlthat
runs each merge using a separate thread, up until a maximum number of
threads (setMaxThreadCount(int)http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/index/ConcurrentMergeScheduler.html#setMaxThreadCount(int))
at which when a merge is needed, the thread(s) that are updating the index
will pause until one or more merges completes. This is a simple way to use
concurrency in the indexing process without having to create and manage
application level threads.

Can you explain what is the threading model of merging life cycle in
ES/Lucene, could it be the reason of the issue I have?

Thanks,

Vadim

On Thu, Jan 6, 2011 at 9:22 AM, Vadim Punski vpunski@gmail.com wrote:

leaving the indexing process over the night shows the same results of
periodic slowdown.
Any ideas?

Thanks

On Wed, Jan 5, 2011 at 4:22 PM, Vadim Punski vpunski@gmail.com wrote:

No changes using local storage (see below)...
I think I've missed something in this configuration, as only
"elasticsearch/data" directory created.
Does this configuration implies "time machine" functionality and cluster
full restart?

cluster.name : MY_INDEX

path:
home: data/elasticsearch
logs: data/elasticsearch/logs

gateway:
recover_after_nodes: 9
recover_after_time: 2m
expected_nodes: 9
fs:
location: data/elasticsearch/snapshot

What I see is that global slowdown for about 5 seconds occurs, and
translog file of size 2.4MB goes to 0 size.

-rw-r--r-- 1 2.4M 2011-01-05 16:14 translog-1294236579592

But how working on 2.4MB transaction file may influence the system so
much?
Seems to me like a very strange behaviour?

Can you explain this?

Thanks

On Wed, Jan 5, 2011 at 3:56 PM, Clinton Gormley clinton@iannounce.co.ukwrote:

On Wed, 2011-01-05 at 15:48 +0200, Vadim Punski wrote:

I totally confused ... What changes should I perform?
I need a file based index, as it will not fit into the RAM.
I need the index to survive full cluster restart.

By default, Elasticsearch uses the 'local' gateway, which is a file
based persistent gateway, but sits on each node instead of in a single
shared filesystem.

The local gateway doubles as the work directory for each node.

It is a good deal more efficient than the shared gateway

clint

Topic		Replies	Views
Slow first request on an index after a short amount of time Elasticsearch	6	9725	March 13, 2020
Facing issue with occasional slow queries on 5 node ES cluster Elasticsearch	12	372	May 5, 2021
Indexing Causing Occasional Search Slowness Elasticsearch	2	335	May 8, 2019
Debugging extremely slow indexing Elasticsearch	39	6401	February 16, 2021
Elasticsearch Indexing Performance Degradation Elasticsearch	6	183	April 22, 2024

Periodic temporary cluster slowdown/freeze during long index process

My mapping:

My elasticsearch.yml:

index: gateway: type: fs snapshot_interval : 10s store: type: niofs number_of_shards : 18 number_of_replicas : 1

My mapping:

My elasticsearch.yml:

index: gateway: type: fs snapshot_interval : 10s store: type: niofs number_of_shards : 18 number_of_replicas : 1

My mapping:

My elasticsearch.yml:

index: gateway: type: fs snapshot_interval : 10s store: type: niofs number_of_shards : 18 number_of_replicas : 1

No changes using local storage (see below)... I think I've missed something in this configuration, as only "elasticsearch/data" directory created. Does this configuration implies "time machine" functionality and cluster full restart?

gateway: recover_after_nodes: 9 recover_after_time: 2m expected_nodes: 9 fs: location: data/elasticsearch/snapshot

No changes using local storage (see below)... I think I've missed something in this configuration, as only "elasticsearch/data" directory created. Does this configuration implies "time machine" functionality and cluster full restart?

gateway: recover_after_nodes: 9 recover_after_time: 2m expected_nodes: 9 fs: location: data/elasticsearch/snapshot

No changes using local storage (see below)... I think I've missed something in this configuration, as only "elasticsearch/data" directory created. Does this configuration implies "time machine" functionality and cluster full restart?

gateway: recover_after_nodes: 9 recover_after_time: 2m expected_nodes: 9 fs: location: data/elasticsearch/snapshot

Related Topics

index:
gateway:
type: fs
snapshot_interval : 10s
store:
type: niofs
number_of_shards : 18
number_of_replicas : 1

index:
gateway:
type: fs
snapshot_interval : 10s
store:
type: niofs
number_of_shards : 18
number_of_replicas : 1

index:
gateway:
type: fs
snapshot_interval : 10s
store:
type: niofs
number_of_shards : 18
number_of_replicas : 1

No changes using local storage (see below)...
I think I've missed something in this configuration, as only
"elasticsearch/data" directory created.
Does this configuration implies "time machine" functionality and cluster
full restart?

gateway:
recover_after_nodes: 9
recover_after_time: 2m
expected_nodes: 9
fs:
location: data/elasticsearch/snapshot

No changes using local storage (see below)...
I think I've missed something in this configuration, as only
"elasticsearch/data" directory created.
Does this configuration implies "time machine" functionality and cluster
full restart?

gateway:
recover_after_nodes: 9
recover_after_time: 2m
expected_nodes: 9
fs:
location: data/elasticsearch/snapshot

No changes using local storage (see below)...
I think I've missed something in this configuration, as only
"elasticsearch/data" directory created.
Does this configuration implies "time machine" functionality and cluster
full restart?

gateway:
recover_after_nodes: 9
recover_after_time: 2m
expected_nodes: 9
fs:
location: data/elasticsearch/snapshot