Refresh_interval:"10s" is better than refresh_interval:"-1"?

Hi,

I'm trying to improve the indexing performances.I follow these instructions


http://www.elastic.co/guide/en/elasticsearch/guide/master/indexing-performance.html

and created 20 different indexes by changing translog,merge,refresh...etc
while "number_of_shards:1,number_of_replica:0" part is common.
I found that the best performed index was* actually **default settings

  • refresh_interval:"10s"(not *refresh_interval:"-1") in terms of doc
    count accuracy or load average or lesser bulk queue
    .

Can anyone explain why this settings is the best?

In addition,I can observe when refresh thread is active,the segments count
decrease.What is exactly the refresh thread doing?

thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsrZNVJtOj__YO6enFqKDt4T1Hxi_pT94W9YQx7bNe%3Dg1g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

This is largely dependent on your setup - node size and config, hardware,
doc sizes, query type, data structure.

On 13 April 2015 at 18:05, Hajime placeofnomemories@gmail.com wrote:

Hi,

I'm trying to improve the indexing performances.I follow these
instructions

https://www.elastic.co/blog/performance-considerations-elasticsearch-indexing

http://www.elastic.co/guide/en/elasticsearch/guide/master/indexing-performance.html

and created 20 different indexes by changing translog,merge,refresh...etc
while "number_of_shards:1,number_of_replica:0" part is common.
I found that the best performed index was* actually **default settings

  • refresh_interval:"10s"(not *refresh_interval:"-1") in terms of doc
    count accuracy or load average or lesser bulk queue
    .

Can anyone explain why this settings is the best?

In addition,I can observe when refresh thread is active,the segments count
decrease.What is exactly the refresh thread doing?

thanks

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsrZNVJtOj__YO6enFqKDt4T1Hxi_pT94W9YQx7bNe%3Dg1g%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsrZNVJtOj__YO6enFqKDt4T1Hxi_pT94W9YQx7bNe%3Dg1g%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8Y9c8Orz-xtz0gijTk-g4VhbBCXv-3dv3tAynr1WnWXA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

You should see better performance with -1 refresh_interval, because Lucene
will flush larger, single segments, causing less merging pressure.

Are both of your tests (-1 vs 10s) fully saturating CPU and/or IO on your
nodes?

If not, then that can explain it: when you have 10s refresh_interval, a
separate thread (refresh thread) bears the cost of moving the new segments
to disk, but with -1, the bulk index threads themselves bear the cost.

But if you test with enough client-side concurrency to saturate your
resources you should see the opposite (-1 refresh_interval is faster
indexing throughput).

Mike McCandless

On Mon, Apr 13, 2015 at 4:05 AM, Hajime placeofnomemories@gmail.com wrote:

Hi,

I'm trying to improve the indexing performances.I follow these
instructions

https://www.elastic.co/blog/performance-considerations-elasticsearch-indexing

http://www.elastic.co/guide/en/elasticsearch/guide/master/indexing-performance.html

and created 20 different indexes by changing translog,merge,refresh...etc
while "number_of_shards:1,number_of_replica:0" part is common.
I found that the best performed index was* actually **default settings

  • refresh_interval:"10s"(not *refresh_interval:"-1") in terms of doc
    count accuracy or load average or lesser bulk queue
    .

Can anyone explain why this settings is the best?

In addition,I can observe when refresh thread is active,the segments count
decrease.What is exactly the refresh thread doing?

thanks

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsrZNVJtOj__YO6enFqKDt4T1Hxi_pT94W9YQx7bNe%3Dg1g%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsrZNVJtOj__YO6enFqKDt4T1Hxi_pT94W9YQx7bNe%3Dg1g%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKHUQPiEsvcVdYdgYLK2PqkA7L-VGngTh-7kMDBXccz%3DkvD%2BjQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi Mike,

I guess you are right.Cpu usage was about 10~20% (of 40 cpu cores) and load
average was about 10.I might well as check out with the condition using
about 50 ~70% of Cpu while maintaining the lower load average.

However,do you have any idea how can I use more Cpu when indexing?I'm
increasing the input amount but Cpu usage remains almost same (and the
processing speed seems almost same).
Should I configure something like "bulk.thread_pool" size or
"indices.memory.max_shard_index_buffer_size"
(


)?

On Mon, Apr 13, 2015 at 6:00 PM, Michael McCandless mike@elastic.co wrote:

You should see better performance with -1 refresh_interval, because Lucene
will flush larger, single segments, causing less merging pressure.

Are both of your tests (-1 vs 10s) fully saturating CPU and/or IO on your
nodes?

If not, then that can explain it: when you have 10s refresh_interval, a
separate thread (refresh thread) bears the cost of moving the new segments
to disk, but with -1, the bulk index threads themselves bear the cost.

But if you test with enough client-side concurrency to saturate your
resources you should see the opposite (-1 refresh_interval is faster
indexing throughput).

Mike McCandless

On Mon, Apr 13, 2015 at 4:05 AM, Hajime placeofnomemories@gmail.com
wrote:

Hi,

I'm trying to improve the indexing performances.I follow these
instructions

https://www.elastic.co/blog/performance-considerations-elasticsearch-indexing

http://www.elastic.co/guide/en/elasticsearch/guide/master/indexing-performance.html

and created 20 different indexes by changing translog,merge,refresh...etc
while "number_of_shards:1,number_of_replica:0" part is common.
I found that the best performed index was* actually **default settings

  • refresh_interval:"10s"(not *refresh_interval:"-1") in terms of doc
    count accuracy or load average or lesser bulk queue
    .

Can anyone explain why this settings is the best?

In addition,I can observe when refresh thread is active,the segments
count decrease.What is exactly the refresh thread doing?

thanks

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsrZNVJtOj__YO6enFqKDt4T1Hxi_pT94W9YQx7bNe%3Dg1g%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsrZNVJtOj__YO6enFqKDt4T1Hxi_pT94W9YQx7bNe%3Dg1g%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKHUQPiEsvcVdYdgYLK2PqkA7L-VGngTh-7kMDBXccz%3DkvD%2BjQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKHUQPiEsvcVdYdgYLK2PqkA7L-VGngTh-7kMDBXccz%3DkvD%2BjQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsqnykKeMnwotNVkJL_R8XMe88t6sYMc0yM0c744RpDbsQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hmm maybe your nodes are IO bound? What IO system are you using?

You should not need to increase the default bulk thread pool size, and if
you are using default 5 shards then each single bulk request to one index
is done concurrently 5X so you only need enough concurrent bulk requests to
saturate the number of CPUs, e.g. 40 / 5 = 8 concurrent bulk indexing
clients.

Mike McCandless

On Mon, Apr 13, 2015 at 7:41 AM, Hajime placeofnomemories@gmail.com wrote:

Hi Mike,

I guess you are right.Cpu usage was about 10~20% (of 40 cpu cores) and
load average was about 10.I might well as check out with the condition
using about 50 ~70% of Cpu while maintaining the lower load average.

However,do you have any idea how can I use more Cpu when indexing?I'm
increasing the input amount but Cpu usage remains almost same (and the
processing speed seems almost same).
Should I configure something like "bulk.thread_pool" size or "indices.memory.max_shard_index_buffer_size"
(
https://github.com/elastic/elasticsearch/blob/97559c0614d900a682d01afc241615cf5627fb4c/src/main/java/org/elasticsearch/indices/memory/IndexingMemoryController.java#L96
)?

On Mon, Apr 13, 2015 at 6:00 PM, Michael McCandless mike@elastic.co
wrote:

You should see better performance with -1 refresh_interval, because
Lucene will flush larger, single segments, causing less merging pressure.

Are both of your tests (-1 vs 10s) fully saturating CPU and/or IO on your
nodes?

If not, then that can explain it: when you have 10s refresh_interval, a
separate thread (refresh thread) bears the cost of moving the new segments
to disk, but with -1, the bulk index threads themselves bear the cost.

But if you test with enough client-side concurrency to saturate your
resources you should see the opposite (-1 refresh_interval is faster
indexing throughput).

Mike McCandless

On Mon, Apr 13, 2015 at 4:05 AM, Hajime placeofnomemories@gmail.com
wrote:

Hi,

I'm trying to improve the indexing performances.I follow these
instructions

https://www.elastic.co/blog/performance-considerations-elasticsearch-indexing

http://www.elastic.co/guide/en/elasticsearch/guide/master/indexing-performance.html

and created 20 different indexes by changing
translog,merge,refresh...etc while "number_of_shards:1,number_of_replica:0"
part is common.

I found that the best performed index was* actually **default settings

  • refresh_interval:"10s"(not *refresh_interval:"-1") in terms of doc
    count accuracy or load average or lesser bulk queue
    .

Can anyone explain why this settings is the best?

In addition,I can observe when refresh thread is active,the segments
count decrease.What is exactly the refresh thread doing?

thanks

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsrZNVJtOj__YO6enFqKDt4T1Hxi_pT94W9YQx7bNe%3Dg1g%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsrZNVJtOj__YO6enFqKDt4T1Hxi_pT94W9YQx7bNe%3Dg1g%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKHUQPiEsvcVdYdgYLK2PqkA7L-VGngTh-7kMDBXccz%3DkvD%2BjQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKHUQPiEsvcVdYdgYLK2PqkA7L-VGngTh-7kMDBXccz%3DkvD%2BjQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsqnykKeMnwotNVkJL_R8XMe88t6sYMc0yM0c744RpDbsQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsqnykKeMnwotNVkJL_R8XMe88t6sYMc0yM0c744RpDbsQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKHUQPiQQmhVEyuyJOyE2oc37V%2BARVHi9aFM7f-uqct4VKC92w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Possibly it is IO bound but I don't seem too many io wait on Cpu or write
activity on iostat.By the way,uses ssd and xfs as file system and default
Directory ( I think it becomes MMapDirectory).

each single bulk request to one index is done concurrently 5X so you only
need enough concurrent bulk requests to saturate the number of CPUs
I suppose that IndexWriter will lock at some point but will this strategy
work on the same index?

However,setting index.merge.async_interval higher than default "1s" seems
better for the huge indexing (I'm still using 1.4.0).I found that it was
removed from recent release of 1.5.0.Do you know why?Will I see better
indexing performance just simply upgrade to >=1.5.0?

On Tue, Apr 14, 2015 at 1:14 AM, Michael McCandless mike@elastic.co wrote:

Hmm maybe your nodes are IO bound? What IO system are you using?

You should not need to increase the default bulk thread pool size, and if
you are using default 5 shards then each single bulk request to one index
is done concurrently 5X so you only need enough concurrent bulk requests to
saturate the number of CPUs, e.g. 40 / 5 = 8 concurrent bulk indexing
clients.

Mike McCandless

On Mon, Apr 13, 2015 at 7:41 AM, Hajime placeofnomemories@gmail.com
wrote:

Hi Mike,

I guess you are right.Cpu usage was about 10~20% (of 40 cpu cores) and
load average was about 10.I might well as check out with the condition
using about 50 ~70% of Cpu while maintaining the lower load average.

However,do you have any idea how can I use more Cpu when indexing?I'm
increasing the input amount but Cpu usage remains almost same (and the
processing speed seems almost same).
Should I configure something like "bulk.thread_pool" size or "indices.memory.max_shard_index_buffer_size"
(
https://github.com/elastic/elasticsearch/blob/97559c0614d900a682d01afc241615cf5627fb4c/src/main/java/org/elasticsearch/indices/memory/IndexingMemoryController.java#L96
)?

On Mon, Apr 13, 2015 at 6:00 PM, Michael McCandless mike@elastic.co
wrote:

You should see better performance with -1 refresh_interval, because
Lucene will flush larger, single segments, causing less merging pressure.

Are both of your tests (-1 vs 10s) fully saturating CPU and/or IO on
your nodes?

If not, then that can explain it: when you have 10s refresh_interval, a
separate thread (refresh thread) bears the cost of moving the new segments
to disk, but with -1, the bulk index threads themselves bear the cost.

But if you test with enough client-side concurrency to saturate your
resources you should see the opposite (-1 refresh_interval is faster
indexing throughput).

Mike McCandless

On Mon, Apr 13, 2015 at 4:05 AM, Hajime placeofnomemories@gmail.com
wrote:

Hi,

I'm trying to improve the indexing performances.I follow these
instructions

https://www.elastic.co/blog/performance-considerations-elasticsearch-indexing

http://www.elastic.co/guide/en/elasticsearch/guide/master/indexing-performance.html

and created 20 different indexes by changing
translog,merge,refresh...etc while "number_of_shards:1,number_of_replica:0"
part is common.

I found that the best performed index was* actually **default settings

  • refresh_interval:"10s"(not *refresh_interval:"-1") in terms of doc
    count accuracy or load average or lesser bulk queue
    .

Can anyone explain why this settings is the best?

In addition,I can observe when refresh thread is active,the segments
count decrease.What is exactly the refresh thread doing?

thanks

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsrZNVJtOj__YO6enFqKDt4T1Hxi_pT94W9YQx7bNe%3Dg1g%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsrZNVJtOj__YO6enFqKDt4T1Hxi_pT94W9YQx7bNe%3Dg1g%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKHUQPiEsvcVdYdgYLK2PqkA7L-VGngTh-7kMDBXccz%3DkvD%2BjQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKHUQPiEsvcVdYdgYLK2PqkA7L-VGngTh-7kMDBXccz%3DkvD%2BjQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsqnykKeMnwotNVkJL_R8XMe88t6sYMc0yM0c744RpDbsQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsqnykKeMnwotNVkJL_R8XMe88t6sYMc0yM0c744RpDbsQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKHUQPiQQmhVEyuyJOyE2oc37V%2BARVHi9aFM7f-uqct4VKC92w%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKHUQPiQQmhVEyuyJOyE2oc37V%2BARVHi9aFM7f-uqct4VKC92w%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsqMALFMDN%2BRQKNXmG-9RRcPtqOQSJGFW6m9fL%3D5Q1Mr%2Bg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

May I ask, when you seek for better indexing performance, what your current
performance is? How many nodes ( = hardware machines) do you have?

Jörg

On Tue, Apr 14, 2015 at 1:36 PM, Hajime placeofnomemories@gmail.com wrote:

Possibly it is IO bound but I don't seem too many io wait on Cpu or write
activity on iostat.By the way,uses ssd and xfs as file system and default
Directory ( I think it becomes MMapDirectory).

each single bulk request to one index is done concurrently 5X so you
only need enough concurrent bulk requests to saturate the number of CPUs
I suppose that IndexWriter will lock at some point but will this strategy
work on the same index?

However,setting index.merge.async_interval higher than default "1s"
seems better for the huge indexing (I'm still using 1.4.0).I found that it
was removed from recent release of 1.5.0.Do you know why?Will I see
better indexing performance just simply upgrade to >=1.5.0?

On Tue, Apr 14, 2015 at 1:14 AM, Michael McCandless mike@elastic.co
wrote:

Hmm maybe your nodes are IO bound? What IO system are you using?

You should not need to increase the default bulk thread pool size, and if
you are using default 5 shards then each single bulk request to one index
is done concurrently 5X so you only need enough concurrent bulk requests to
saturate the number of CPUs, e.g. 40 / 5 = 8 concurrent bulk indexing
clients.

Mike McCandless

On Mon, Apr 13, 2015 at 7:41 AM, Hajime placeofnomemories@gmail.com
wrote:

Hi Mike,

I guess you are right.Cpu usage was about 10~20% (of 40 cpu cores) and
load average was about 10.I might well as check out with the condition
using about 50 ~70% of Cpu while maintaining the lower load average.

However,do you have any idea how can I use more Cpu when indexing?I'm
increasing the input amount but Cpu usage remains almost same (and the
processing speed seems almost same).
Should I configure something like "bulk.thread_pool" size or "indices.memory.max_shard_index_buffer_size"
(
https://github.com/elastic/elasticsearch/blob/97559c0614d900a682d01afc241615cf5627fb4c/src/main/java/org/elasticsearch/indices/memory/IndexingMemoryController.java#L96
)?

On Mon, Apr 13, 2015 at 6:00 PM, Michael McCandless mike@elastic.co
wrote:

You should see better performance with -1 refresh_interval, because
Lucene will flush larger, single segments, causing less merging pressure.

Are both of your tests (-1 vs 10s) fully saturating CPU and/or IO on
your nodes?

If not, then that can explain it: when you have 10s refresh_interval, a
separate thread (refresh thread) bears the cost of moving the new segments
to disk, but with -1, the bulk index threads themselves bear the cost.

But if you test with enough client-side concurrency to saturate your
resources you should see the opposite (-1 refresh_interval is faster
indexing throughput).

Mike McCandless

On Mon, Apr 13, 2015 at 4:05 AM, Hajime placeofnomemories@gmail.com
wrote:

Hi,

I'm trying to improve the indexing performances.I follow these
instructions

https://www.elastic.co/blog/performance-considerations-elasticsearch-indexing

http://www.elastic.co/guide/en/elasticsearch/guide/master/indexing-performance.html

and created 20 different indexes by changing
translog,merge,refresh...etc while "number_of_shards:1,number_of_replica:0"
part is common.

I found that the best performed index was* actually **default
settings + refresh_interval:"10s"(not *refresh_interval:"-1") in
terms of doc count accuracy or load average or lesser bulk queue
.

Can anyone explain why this settings is the best?

In addition,I can observe when refresh thread is active,the segments
count decrease.What is exactly the refresh thread doing?

thanks

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsrZNVJtOj__YO6enFqKDt4T1Hxi_pT94W9YQx7bNe%3Dg1g%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsrZNVJtOj__YO6enFqKDt4T1Hxi_pT94W9YQx7bNe%3Dg1g%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKHUQPiEsvcVdYdgYLK2PqkA7L-VGngTh-7kMDBXccz%3DkvD%2BjQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKHUQPiEsvcVdYdgYLK2PqkA7L-VGngTh-7kMDBXccz%3DkvD%2BjQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsqnykKeMnwotNVkJL_R8XMe88t6sYMc0yM0c744RpDbsQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsqnykKeMnwotNVkJL_R8XMe88t6sYMc0yM0c744RpDbsQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKHUQPiQQmhVEyuyJOyE2oc37V%2BARVHi9aFM7f-uqct4VKC92w%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKHUQPiQQmhVEyuyJOyE2oc37V%2BARVHi9aFM7f-uqct4VKC92w%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsqMALFMDN%2BRQKNXmG-9RRcPtqOQSJGFW6m9fL%3D5Q1Mr%2Bg%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHm3ZsqMALFMDN%2BRQKNXmG-9RRcPtqOQSJGFW6m9fL%3D5Q1Mr%2Bg%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHxAFKRZtC_DQ2ss_7cx_T%2BiZhvq2xL39RpG2wP-uSbeg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

On Tue, Apr 14, 2015 at 7:36 AM, Hajime placeofnomemories@gmail.com wrote:

Possibly it is IO bound but I don't seem too many io wait on Cpu or write
activity on iostat.By the way,uses ssd and xfs as file system and default
Directory ( I think it becomes MMapDirectory).

Local SSD (not e.g. Amazon's EBS backed by SSD)? Is this dedicated
hardware or virtual? Dedicated is better.

each single bulk request to one index is done concurrently 5X so you
only need enough concurrent bulk requests to saturate the number of CPUs
I suppose that IndexWriter will lock at some point but will this strategy
work on the same index?

Yes, for one index ES creates 5 shards by default, so a single bulk request
indexing N docs will effectively use 5 CPUs assuming docs are routed evenly.

However,setting index.merge.async_interval higher than default "1s"
seems better for the huge indexing (I'm still using 1.4.0).I found that it
was removed from recent release of 1.5.0.Do you know why?Will I see
better indexing performance just simply upgrade to >=1.5.0?

Please don't change that setting: it's a bad idea. By increasing it, you
are delaying when Lucene gets a chance to kick off segment merging.

I would recommend upgrading.

Mike McCandless

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKHUQPgJ5Tt3cXX5O_F7JEr3XqAhaNSfb7x9QWoX5q8d4Z_PUw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.