Refresh_interval:"10s" is better than refresh_interval:"-1"?

Hajime_Takase · April 13, 2015, 8:05am

Hi,

I'm trying to improve the indexing performances.I follow these instructions

http://www.elastic.co/guide/en/elasticsearch/guide/master/indexing-performance.html

and created 20 different indexes by changing translog,merge,refresh...etc
while "number_of_shards:1,number_of_replica:0" part is common.
I found that the best performed index was* actually **default settings

refresh_interval:"10s"(not *refresh_interval:"-1") in terms of doc
count accuracy or load average or lesser bulk queue.

Can anyone explain why this settings is the best?

In addition,I can observe when refresh thread is active,the segments count
decrease.What is exactly the refresh thread doing?

thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsrZNVJtOj__YO6enFqKDt4T1Hxi_pT94W9YQx7bNe%3Dg1g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · April 13, 2015, 8:36am

This is largely dependent on your setup - node size and config, hardware,
doc sizes, query type, data structure.

On 13 April 2015 at 18:05, Hajime placeofnomemories@gmail.com wrote:

Hi,

I'm trying to improve the indexing performances.I follow these
instructions

Performance Considerations for Elasticsearch Indexing | Elastic Blog

Indexing Performance Tips | Elasticsearch: The Definitive Guide [master] | Elastic

and created 20 different indexes by changing translog,merge,refresh...etc
while "number_of_shards:1,number_of_replica:0" part is common.
I found that the best performed index was* actually **default settings

refresh_interval:"10s"(not *refresh_interval:"-1") in terms of doc
count accuracy or load average or lesser bulk queue.

Can anyone explain why this settings is the best?

In addition,I can observe when refresh thread is active,the segments count
decrease.What is exactly the refresh thread doing?

thanks

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsrZNVJtOj__YO6enFqKDt4T1Hxi_pT94W9YQx7bNe%3Dg1g%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsrZNVJtOj__YO6enFqKDt4T1Hxi_pT94W9YQx7bNe%3Dg1g%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8Y9c8Orz-xtz0gijTk-g4VhbBCXv-3dv3tAynr1WnWXA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

mikemccand · April 13, 2015, 9:00am

You should see better performance with -1 refresh_interval, because Lucene
will flush larger, single segments, causing less merging pressure.

Are both of your tests (-1 vs 10s) fully saturating CPU and/or IO on your
nodes?

If not, then that can explain it: when you have 10s refresh_interval, a
separate thread (refresh thread) bears the cost of moving the new segments
to disk, but with -1, the bulk index threads themselves bear the cost.

But if you test with enough client-side concurrency to saturate your
resources you should see the opposite (-1 refresh_interval is faster
indexing throughput).

Mike McCandless

On Mon, Apr 13, 2015 at 4:05 AM, Hajime placeofnomemories@gmail.com wrote:

Hi,

I'm trying to improve the indexing performances.I follow these
instructions

Performance Considerations for Elasticsearch Indexing | Elastic Blog

Indexing Performance Tips | Elasticsearch: The Definitive Guide [master] | Elastic

and created 20 different indexes by changing translog,merge,refresh...etc
while "number_of_shards:1,number_of_replica:0" part is common.
I found that the best performed index was* actually **default settings

refresh_interval:"10s"(not *refresh_interval:"-1") in terms of doc
count accuracy or load average or lesser bulk queue.

Can anyone explain why this settings is the best?

In addition,I can observe when refresh thread is active,the segments count
decrease.What is exactly the refresh thread doing?

thanks

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsrZNVJtOj__YO6enFqKDt4T1Hxi_pT94W9YQx7bNe%3Dg1g%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsrZNVJtOj__YO6enFqKDt4T1Hxi_pT94W9YQx7bNe%3Dg1g%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKHUQPiEsvcVdYdgYLK2PqkA7L-VGngTh-7kMDBXccz%3DkvD%2BjQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hajime_Takase · April 13, 2015, 11:41am

Hi Mike,

I guess you are right.Cpu usage was about 10~20% (of 40 cpu cores) and load
average was about 10.I might well as check out with the condition using
about 50 ~70% of Cpu while maintaining the lower load average.

However,do you have any idea how can I use more Cpu when indexing?I'm
increasing the input amount but Cpu usage remains almost same (and the
processing speed seems almost same).
Should I configure something like "bulk.thread_pool" size or
"indices.memory.max_shard_index_buffer_size"
(

github.com

elastic/elasticsearch/blob/97559c0614d900a682d01afc241615cf5627fb4c/src/main/java/org/elasticsearch/indices/memory/IndexingMemoryController.java#L96


      
              }
              if (maxIndexingBuffer != null && indexingBuffer.bytes() > maxIndexingBuffer.bytes()) {
                  indexingBuffer = maxIndexingBuffer;
              }
          } else {
              indexingBuffer = ByteSizeValue.parseBytesSizeValue(indexingBufferSetting, null);
          }
          this.indexingBuffer = indexingBuffer;
          this.minShardIndexBufferSize = this.settings.getAsBytesSize("indices.memory.min_shard_index_buffer_size", new ByteSizeValue(4, ByteSizeUnit.MB));
          // LUCENE MONITOR: Based on this thread, currently (based on Mike), having a large buffer does not make a lot of sense: https://issues.apache.org/jira/browse/LUCENE-2324?focusedCommentId=13005155&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13005155
          this.maxShardIndexBufferSize = this.settings.getAsBytesSize("indices.memory.max_shard_index_buffer_size", new ByteSizeValue(512, ByteSizeUnit.MB));
          
          ByteSizeValue translogBuffer;
          String translogBufferSetting = this.settings.get("indices.memory.translog_buffer_size", "1%");
          if (translogBufferSetting.endsWith("%")) {
              double percent = Double.parseDouble(translogBufferSetting.substring(0, translogBufferSetting.length() - 1));
              translogBuffer = new ByteSizeValue((long) (((double) JvmInfo.jvmInfo().mem().heapMax().bytes()) * (percent / 100)));
              ByteSizeValue minTranslogBuffer = this.settings.getAsBytesSize("indices.memory.min_translog_buffer_size", new ByteSizeValue(256, ByteSizeUnit.KB));
              ByteSizeValue maxTranslogBuffer = this.settings.getAsBytesSize("indices.memory.max_translog_buffer_size", null);
          
              if (translogBuffer.bytes() < minTranslogBuffer.bytes()) {

)?

On Mon, Apr 13, 2015 at 6:00 PM, Michael McCandless mike@elastic.co wrote:

You should see better performance with -1 refresh_interval, because Lucene
will flush larger, single segments, causing less merging pressure.

Are both of your tests (-1 vs 10s) fully saturating CPU and/or IO on your
nodes?

If not, then that can explain it: when you have 10s refresh_interval, a
separate thread (refresh thread) bears the cost of moving the new segments
to disk, but with -1, the bulk index threads themselves bear the cost.

But if you test with enough client-side concurrency to saturate your
resources you should see the opposite (-1 refresh_interval is faster
indexing throughput).

Mike McCandless

On Mon, Apr 13, 2015 at 4:05 AM, Hajime placeofnomemories@gmail.com
wrote:

Hi,

I'm trying to improve the indexing performances.I follow these
instructions

Performance Considerations for Elasticsearch Indexing | Elastic Blog

Indexing Performance Tips | Elasticsearch: The Definitive Guide [master] | Elastic

and created 20 different indexes by changing translog,merge,refresh...etc
while "number_of_shards:1,number_of_replica:0" part is common.
I found that the best performed index was* actually **default settings

refresh_interval:"10s"(not *refresh_interval:"-1") in terms of doc
count accuracy or load average or lesser bulk queue.

Can anyone explain why this settings is the best?

In addition,I can observe when refresh thread is active,the segments
count decrease.What is exactly the refresh thread doing?

thanks

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsrZNVJtOj__YO6enFqKDt4T1Hxi_pT94W9YQx7bNe%3Dg1g%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsrZNVJtOj__YO6enFqKDt4T1Hxi_pT94W9YQx7bNe%3Dg1g%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKHUQPiEsvcVdYdgYLK2PqkA7L-VGngTh-7kMDBXccz%3DkvD%2BjQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKHUQPiEsvcVdYdgYLK2PqkA7L-VGngTh-7kMDBXccz%3DkvD%2BjQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsqnykKeMnwotNVkJL_R8XMe88t6sYMc0yM0c744RpDbsQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

mikemccand · April 13, 2015, 4:14pm

Hmm maybe your nodes are IO bound? What IO system are you using?

You should not need to increase the default bulk thread pool size, and if
you are using default 5 shards then each single bulk request to one index
is done concurrently 5X so you only need enough concurrent bulk requests to
saturate the number of CPUs, e.g. 40 / 5 = 8 concurrent bulk indexing
clients.

Mike McCandless

On Mon, Apr 13, 2015 at 7:41 AM, Hajime placeofnomemories@gmail.com wrote:

Hi Mike,

I guess you are right.Cpu usage was about 10~20% (of 40 cpu cores) and
load average was about 10.I might well as check out with the condition
using about 50 ~70% of Cpu while maintaining the lower load average.

However,do you have any idea how can I use more Cpu when indexing?I'm
increasing the input amount but Cpu usage remains almost same (and the
processing speed seems almost same).
Should I configure something like "bulk.thread_pool" size or "indices.memory.max_shard_index_buffer_size"
(
https://github.com/elastic/elasticsearch/blob/97559c0614d900a682d01afc241615cf5627fb4c/src/main/java/org/elasticsearch/indices/memory/IndexingMemoryController.java#L96
)?

On Mon, Apr 13, 2015 at 6:00 PM, Michael McCandless mike@elastic.co
wrote:

You should see better performance with -1 refresh_interval, because
Lucene will flush larger, single segments, causing less merging pressure.

Are both of your tests (-1 vs 10s) fully saturating CPU and/or IO on your
nodes?

If not, then that can explain it: when you have 10s refresh_interval, a
separate thread (refresh thread) bears the cost of moving the new segments
to disk, but with -1, the bulk index threads themselves bear the cost.

But if you test with enough client-side concurrency to saturate your
resources you should see the opposite (-1 refresh_interval is faster
indexing throughput).

Mike McCandless

On Mon, Apr 13, 2015 at 4:05 AM, Hajime placeofnomemories@gmail.com
wrote:

Hi,

I'm trying to improve the indexing performances.I follow these
instructions

Performance Considerations for Elasticsearch Indexing | Elastic Blog

Indexing Performance Tips | Elasticsearch: The Definitive Guide [master] | Elastic

and created 20 different indexes by changing
translog,merge,refresh...etc while "number_of_shards:1,number_of_replica:0"
part is common.
I found that the best performed index was* actually **default settings

refresh_interval:"10s"(not *refresh_interval:"-1") in terms of doc
count accuracy or load average or lesser bulk queue.

Can anyone explain why this settings is the best?

In addition,I can observe when refresh thread is active,the segments
count decrease.What is exactly the refresh thread doing?

thanks

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsrZNVJtOj__YO6enFqKDt4T1Hxi_pT94W9YQx7bNe%3Dg1g%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsrZNVJtOj__YO6enFqKDt4T1Hxi_pT94W9YQx7bNe%3Dg1g%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKHUQPiEsvcVdYdgYLK2PqkA7L-VGngTh-7kMDBXccz%3DkvD%2BjQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKHUQPiEsvcVdYdgYLK2PqkA7L-VGngTh-7kMDBXccz%3DkvD%2BjQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsqnykKeMnwotNVkJL_R8XMe88t6sYMc0yM0c744RpDbsQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsqnykKeMnwotNVkJL_R8XMe88t6sYMc0yM0c744RpDbsQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKHUQPiQQmhVEyuyJOyE2oc37V%2BARVHi9aFM7f-uqct4VKC92w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hajime_Takase · April 14, 2015, 11:36am

Possibly it is IO bound but I don't seem too many io wait on Cpu or write
activity on iostat.By the way,uses ssd and xfs as file system and default
Directory ( I think it becomes MMapDirectory).

each single bulk request to one index is done concurrently 5X so you only
need enough concurrent bulk requests to saturate the number of CPUs
I suppose that IndexWriter will lock at some point but will this strategy
work on the same index?

However,setting index.merge.async_interval higher than default "1s" seems
better for the huge indexing (I'm still using 1.4.0).I found that it was
removed from recent release of 1.5.0.Do you know why?Will I see better
indexing performance just simply upgrade to >=1.5.0?

On Tue, Apr 14, 2015 at 1:14 AM, Michael McCandless mike@elastic.co wrote:

Hmm maybe your nodes are IO bound? What IO system are you using?

You should not need to increase the default bulk thread pool size, and if
you are using default 5 shards then each single bulk request to one index
is done concurrently 5X so you only need enough concurrent bulk requests to
saturate the number of CPUs, e.g. 40 / 5 = 8 concurrent bulk indexing
clients.

Mike McCandless

On Mon, Apr 13, 2015 at 7:41 AM, Hajime placeofnomemories@gmail.com
wrote:

Hi Mike,

I guess you are right.Cpu usage was about 10~20% (of 40 cpu cores) and
load average was about 10.I might well as check out with the condition
using about 50 ~70% of Cpu while maintaining the lower load average.

However,do you have any idea how can I use more Cpu when indexing?I'm
increasing the input amount but Cpu usage remains almost same (and the
processing speed seems almost same).
Should I configure something like "bulk.thread_pool" size or "indices.memory.max_shard_index_buffer_size"
(
https://github.com/elastic/elasticsearch/blob/97559c0614d900a682d01afc241615cf5627fb4c/src/main/java/org/elasticsearch/indices/memory/IndexingMemoryController.java#L96
)?

On Mon, Apr 13, 2015 at 6:00 PM, Michael McCandless mike@elastic.co
wrote:

You should see better performance with -1 refresh_interval, because
Lucene will flush larger, single segments, causing less merging pressure.

Are both of your tests (-1 vs 10s) fully saturating CPU and/or IO on
your nodes?

If not, then that can explain it: when you have 10s refresh_interval, a
separate thread (refresh thread) bears the cost of moving the new segments
to disk, but with -1, the bulk index threads themselves bear the cost.

But if you test with enough client-side concurrency to saturate your
resources you should see the opposite (-1 refresh_interval is faster
indexing throughput).

Mike McCandless

On Mon, Apr 13, 2015 at 4:05 AM, Hajime placeofnomemories@gmail.com
wrote:

Hi,

I'm trying to improve the indexing performances.I follow these
instructions

Performance Considerations for Elasticsearch Indexing | Elastic Blog

Indexing Performance Tips | Elasticsearch: The Definitive Guide [master] | Elastic

and created 20 different indexes by changing
translog,merge,refresh...etc while "number_of_shards:1,number_of_replica:0"
part is common.
I found that the best performed index was* actually **default settings

refresh_interval:"10s"(not *refresh_interval:"-1") in terms of doc
count accuracy or load average or lesser bulk queue.

Can anyone explain why this settings is the best?

In addition,I can observe when refresh thread is active,the segments
count decrease.What is exactly the refresh thread doing?

thanks

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsrZNVJtOj__YO6enFqKDt4T1Hxi_pT94W9YQx7bNe%3Dg1g%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsrZNVJtOj__YO6enFqKDt4T1Hxi_pT94W9YQx7bNe%3Dg1g%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKHUQPiEsvcVdYdgYLK2PqkA7L-VGngTh-7kMDBXccz%3DkvD%2BjQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKHUQPiEsvcVdYdgYLK2PqkA7L-VGngTh-7kMDBXccz%3DkvD%2BjQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsqnykKeMnwotNVkJL_R8XMe88t6sYMc0yM0c744RpDbsQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsqnykKeMnwotNVkJL_R8XMe88t6sYMc0yM0c744RpDbsQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKHUQPiQQmhVEyuyJOyE2oc37V%2BARVHi9aFM7f-uqct4VKC92w%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKHUQPiQQmhVEyuyJOyE2oc37V%2BARVHi9aFM7f-uqct4VKC92w%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsqMALFMDN%2BRQKNXmG-9RRcPtqOQSJGFW6m9fL%3D5Q1Mr%2Bg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

jprante · April 14, 2015, 1:11pm

May I ask, when you seek for better indexing performance, what your current
performance is? How many nodes ( = hardware machines) do you have?

Jörg

On Tue, Apr 14, 2015 at 1:36 PM, Hajime placeofnomemories@gmail.com wrote:

Possibly it is IO bound but I don't seem too many io wait on Cpu or write
activity on iostat.By the way,uses ssd and xfs as file system and default
Directory ( I think it becomes MMapDirectory).

each single bulk request to one index is done concurrently 5X so you
only need enough concurrent bulk requests to saturate the number of CPUs
I suppose that IndexWriter will lock at some point but will this strategy
work on the same index?

However,setting index.merge.async_interval higher than default "1s"
seems better for the huge indexing (I'm still using 1.4.0).I found that it
was removed from recent release of 1.5.0.Do you know why?Will I see
better indexing performance just simply upgrade to >=1.5.0?

On Tue, Apr 14, 2015 at 1:14 AM, Michael McCandless mike@elastic.co
wrote:

Hmm maybe your nodes are IO bound? What IO system are you using?

You should not need to increase the default bulk thread pool size, and if
you are using default 5 shards then each single bulk request to one index
is done concurrently 5X so you only need enough concurrent bulk requests to
saturate the number of CPUs, e.g. 40 / 5 = 8 concurrent bulk indexing
clients.

Mike McCandless

On Mon, Apr 13, 2015 at 7:41 AM, Hajime placeofnomemories@gmail.com
wrote:

Hi Mike,

I guess you are right.Cpu usage was about 10~20% (of 40 cpu cores) and
load average was about 10.I might well as check out with the condition
using about 50 ~70% of Cpu while maintaining the lower load average.

However,do you have any idea how can I use more Cpu when indexing?I'm
increasing the input amount but Cpu usage remains almost same (and the
processing speed seems almost same).
Should I configure something like "bulk.thread_pool" size or "indices.memory.max_shard_index_buffer_size"
(
https://github.com/elastic/elasticsearch/blob/97559c0614d900a682d01afc241615cf5627fb4c/src/main/java/org/elasticsearch/indices/memory/IndexingMemoryController.java#L96
)?

On Mon, Apr 13, 2015 at 6:00 PM, Michael McCandless mike@elastic.co
wrote:

You should see better performance with -1 refresh_interval, because
Lucene will flush larger, single segments, causing less merging pressure.

Are both of your tests (-1 vs 10s) fully saturating CPU and/or IO on
your nodes?

If not, then that can explain it: when you have 10s refresh_interval, a
separate thread (refresh thread) bears the cost of moving the new segments
to disk, but with -1, the bulk index threads themselves bear the cost.

But if you test with enough client-side concurrency to saturate your
resources you should see the opposite (-1 refresh_interval is faster
indexing throughput).

Mike McCandless

On Mon, Apr 13, 2015 at 4:05 AM, Hajime placeofnomemories@gmail.com
wrote:

Hi,

I'm trying to improve the indexing performances.I follow these
instructions

Performance Considerations for Elasticsearch Indexing | Elastic Blog

Indexing Performance Tips | Elasticsearch: The Definitive Guide [master] | Elastic

and created 20 different indexes by changing
translog,merge,refresh...etc while "number_of_shards:1,number_of_replica:0"
part is common.
I found that the best performed index was* actually **default
settings + refresh_interval:"10s"(not *refresh_interval:"-1") in
terms of doc count accuracy or load average or lesser bulk queue.

Can anyone explain why this settings is the best?

In addition,I can observe when refresh thread is active,the segments
count decrease.What is exactly the refresh thread doing?

thanks

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsrZNVJtOj__YO6enFqKDt4T1Hxi_pT94W9YQx7bNe%3Dg1g%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsrZNVJtOj__YO6enFqKDt4T1Hxi_pT94W9YQx7bNe%3Dg1g%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKHUQPiEsvcVdYdgYLK2PqkA7L-VGngTh-7kMDBXccz%3DkvD%2BjQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKHUQPiEsvcVdYdgYLK2PqkA7L-VGngTh-7kMDBXccz%3DkvD%2BjQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsqnykKeMnwotNVkJL_R8XMe88t6sYMc0yM0c744RpDbsQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsqnykKeMnwotNVkJL_R8XMe88t6sYMc0yM0c744RpDbsQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKHUQPiQQmhVEyuyJOyE2oc37V%2BARVHi9aFM7f-uqct4VKC92w%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKHUQPiQQmhVEyuyJOyE2oc37V%2BARVHi9aFM7f-uqct4VKC92w%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsqMALFMDN%2BRQKNXmG-9RRcPtqOQSJGFW6m9fL%3D5Q1Mr%2Bg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsqMALFMDN%2BRQKNXmG-9RRcPtqOQSJGFW6m9fL%3D5Q1Mr%2Bg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHxAFKRZtC_DQ2ss_7cx_T%2BiZhvq2xL39RpG2wP-uSbeg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

mikemccand · April 15, 2015, 10:48am

On Tue, Apr 14, 2015 at 7:36 AM, Hajime placeofnomemories@gmail.com wrote:

Possibly it is IO bound but I don't seem too many io wait on Cpu or write
activity on iostat.By the way,uses ssd and xfs as file system and default
Directory ( I think it becomes MMapDirectory).

Local SSD (not e.g. Amazon's EBS backed by SSD)? Is this dedicated
hardware or virtual? Dedicated is better.

each single bulk request to one index is done concurrently 5X so you
only need enough concurrent bulk requests to saturate the number of CPUs
I suppose that IndexWriter will lock at some point but will this strategy
work on the same index?

Yes, for one index ES creates 5 shards by default, so a single bulk request
indexing N docs will effectively use 5 CPUs assuming docs are routed evenly.

However,setting index.merge.async_interval higher than default "1s"
seems better for the huge indexing (I'm still using 1.4.0).I found that it
was removed from recent release of 1.5.0.Do you know why?Will I see
better indexing performance just simply upgrade to >=1.5.0?

Please don't change that setting: it's a bad idea. By increasing it, you
are delaying when Lucene gets a chance to kick off segment merging.

I would recommend upgrading.

Mike McCandless

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKHUQPgJ5Tt3cXX5O_F7JEr3XqAhaNSfb7x9QWoX5q8d4Z_PUw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Bulk indexing - optimal refresh_interval Elasticsearch	5	678	July 6, 2017
No efect refresh_interval Elasticsearch	5	530	July 6, 2017
How to work index refresh? & asyncronous replication setting Elasticsearch	7	580	July 6, 2017
How to tune ES for maximum write performance Elasticsearch	10	793	July 6, 2017
ES indexing speed decreases substantially while doing refresh Elasticsearch	10	359	July 6, 2017

Refresh_interval:"10s" is better than refresh_interval:"-1"?

Related topics