and created 20 different indexes by changing translog,merge,refresh...etc
while "number_of_shards:1,number_of_replica:0" part is common.
I found that the best performed index was* actually **default settings
refresh_interval:"10s"(not *refresh_interval:"-1") in terms of doc
count accuracy or load average or lesser bulk queue.
Can anyone explain why this settings is the best?
In addition,I can observe when refresh thread is active,the segments count
decrease.What is exactly the refresh thread doing?
and created 20 different indexes by changing translog,merge,refresh...etc
while "number_of_shards:1,number_of_replica:0" part is common.
I found that the best performed index was* actually **default settings
refresh_interval:"10s"(not *refresh_interval:"-1") in terms of doc
count accuracy or load average or lesser bulk queue.
Can anyone explain why this settings is the best?
In addition,I can observe when refresh thread is active,the segments count
decrease.What is exactly the refresh thread doing?
You should see better performance with -1 refresh_interval, because Lucene
will flush larger, single segments, causing less merging pressure.
Are both of your tests (-1 vs 10s) fully saturating CPU and/or IO on your
nodes?
If not, then that can explain it: when you have 10s refresh_interval, a
separate thread (refresh thread) bears the cost of moving the new segments
to disk, but with -1, the bulk index threads themselves bear the cost.
But if you test with enough client-side concurrency to saturate your
resources you should see the opposite (-1 refresh_interval is faster
indexing throughput).
and created 20 different indexes by changing translog,merge,refresh...etc
while "number_of_shards:1,number_of_replica:0" part is common.
I found that the best performed index was* actually **default settings
refresh_interval:"10s"(not *refresh_interval:"-1") in terms of doc
count accuracy or load average or lesser bulk queue.
Can anyone explain why this settings is the best?
In addition,I can observe when refresh thread is active,the segments count
decrease.What is exactly the refresh thread doing?
I guess you are right.Cpu usage was about 10~20% (of 40 cpu cores) and load
average was about 10.I might well as check out with the condition using
about 50 ~70% of Cpu while maintaining the lower load average.
However,do you have any idea how can I use more Cpu when indexing?I'm
increasing the input amount but Cpu usage remains almost same (and the
processing speed seems almost same).
Should I configure something like "bulk.thread_pool" size or
"indices.memory.max_shard_index_buffer_size"
(
)?
On Mon, Apr 13, 2015 at 6:00 PM, Michael McCandless mike@elastic.co wrote:
You should see better performance with -1 refresh_interval, because Lucene
will flush larger, single segments, causing less merging pressure.
Are both of your tests (-1 vs 10s) fully saturating CPU and/or IO on your
nodes?
If not, then that can explain it: when you have 10s refresh_interval, a
separate thread (refresh thread) bears the cost of moving the new segments
to disk, but with -1, the bulk index threads themselves bear the cost.
But if you test with enough client-side concurrency to saturate your
resources you should see the opposite (-1 refresh_interval is faster
indexing throughput).
and created 20 different indexes by changing translog,merge,refresh...etc
while "number_of_shards:1,number_of_replica:0" part is common.
I found that the best performed index was* actually **default settings
refresh_interval:"10s"(not *refresh_interval:"-1") in terms of doc
count accuracy or load average or lesser bulk queue.
Can anyone explain why this settings is the best?
In addition,I can observe when refresh thread is active,the segments
count decrease.What is exactly the refresh thread doing?
Hmm maybe your nodes are IO bound? What IO system are you using?
You should not need to increase the default bulk thread pool size, and if
you are using default 5 shards then each single bulk request to one index
is done concurrently 5X so you only need enough concurrent bulk requests to
saturate the number of CPUs, e.g. 40 / 5 = 8 concurrent bulk indexing
clients.
I guess you are right.Cpu usage was about 10~20% (of 40 cpu cores) and
load average was about 10.I might well as check out with the condition
using about 50 ~70% of Cpu while maintaining the lower load average.
On Mon, Apr 13, 2015 at 6:00 PM, Michael McCandless mike@elastic.co
wrote:
You should see better performance with -1 refresh_interval, because
Lucene will flush larger, single segments, causing less merging pressure.
Are both of your tests (-1 vs 10s) fully saturating CPU and/or IO on your
nodes?
If not, then that can explain it: when you have 10s refresh_interval, a
separate thread (refresh thread) bears the cost of moving the new segments
to disk, but with -1, the bulk index threads themselves bear the cost.
But if you test with enough client-side concurrency to saturate your
resources you should see the opposite (-1 refresh_interval is faster
indexing throughput).
and created 20 different indexes by changing
translog,merge,refresh...etc while "number_of_shards:1,number_of_replica:0"
part is common.
I found that the best performed index was* actually **default settings
refresh_interval:"10s"(not *refresh_interval:"-1") in terms of doc
count accuracy or load average or lesser bulk queue.
Can anyone explain why this settings is the best?
In addition,I can observe when refresh thread is active,the segments
count decrease.What is exactly the refresh thread doing?
Possibly it is IO bound but I don't seem too many io wait on Cpu or write
activity on iostat.By the way,uses ssd and xfs as file system and default
Directory ( I think it becomes MMapDirectory).
each single bulk request to one index is done concurrently 5X so you only
need enough concurrent bulk requests to saturate the number of CPUs
I suppose that IndexWriter will lock at some point but will this strategy
work on the same index?
However,setting index.merge.async_interval higher than default "1s" seems
better for the huge indexing (I'm still using 1.4.0).I found that it was
removed from recent release of 1.5.0.Do you know why?Will I see better
indexing performance just simply upgrade to >=1.5.0?
On Tue, Apr 14, 2015 at 1:14 AM, Michael McCandless mike@elastic.co wrote:
Hmm maybe your nodes are IO bound? What IO system are you using?
You should not need to increase the default bulk thread pool size, and if
you are using default 5 shards then each single bulk request to one index
is done concurrently 5X so you only need enough concurrent bulk requests to
saturate the number of CPUs, e.g. 40 / 5 = 8 concurrent bulk indexing
clients.
I guess you are right.Cpu usage was about 10~20% (of 40 cpu cores) and
load average was about 10.I might well as check out with the condition
using about 50 ~70% of Cpu while maintaining the lower load average.
On Mon, Apr 13, 2015 at 6:00 PM, Michael McCandless mike@elastic.co
wrote:
You should see better performance with -1 refresh_interval, because
Lucene will flush larger, single segments, causing less merging pressure.
Are both of your tests (-1 vs 10s) fully saturating CPU and/or IO on
your nodes?
If not, then that can explain it: when you have 10s refresh_interval, a
separate thread (refresh thread) bears the cost of moving the new segments
to disk, but with -1, the bulk index threads themselves bear the cost.
But if you test with enough client-side concurrency to saturate your
resources you should see the opposite (-1 refresh_interval is faster
indexing throughput).
and created 20 different indexes by changing
translog,merge,refresh...etc while "number_of_shards:1,number_of_replica:0"
part is common.
I found that the best performed index was* actually **default settings
refresh_interval:"10s"(not *refresh_interval:"-1") in terms of doc
count accuracy or load average or lesser bulk queue.
Can anyone explain why this settings is the best?
In addition,I can observe when refresh thread is active,the segments
count decrease.What is exactly the refresh thread doing?
Possibly it is IO bound but I don't seem too many io wait on Cpu or write
activity on iostat.By the way,uses ssd and xfs as file system and default
Directory ( I think it becomes MMapDirectory).
each single bulk request to one index is done concurrently 5X so you
only need enough concurrent bulk requests to saturate the number of CPUs
I suppose that IndexWriter will lock at some point but will this strategy
work on the same index?
However,setting index.merge.async_interval higher than default "1s"
seems better for the huge indexing (I'm still using 1.4.0).I found that it
was removed from recent release of 1.5.0.Do you know why?Will I see
better indexing performance just simply upgrade to >=1.5.0?
On Tue, Apr 14, 2015 at 1:14 AM, Michael McCandless mike@elastic.co
wrote:
Hmm maybe your nodes are IO bound? What IO system are you using?
You should not need to increase the default bulk thread pool size, and if
you are using default 5 shards then each single bulk request to one index
is done concurrently 5X so you only need enough concurrent bulk requests to
saturate the number of CPUs, e.g. 40 / 5 = 8 concurrent bulk indexing
clients.
I guess you are right.Cpu usage was about 10~20% (of 40 cpu cores) and
load average was about 10.I might well as check out with the condition
using about 50 ~70% of Cpu while maintaining the lower load average.
On Mon, Apr 13, 2015 at 6:00 PM, Michael McCandless mike@elastic.co
wrote:
You should see better performance with -1 refresh_interval, because
Lucene will flush larger, single segments, causing less merging pressure.
Are both of your tests (-1 vs 10s) fully saturating CPU and/or IO on
your nodes?
If not, then that can explain it: when you have 10s refresh_interval, a
separate thread (refresh thread) bears the cost of moving the new segments
to disk, but with -1, the bulk index threads themselves bear the cost.
But if you test with enough client-side concurrency to saturate your
resources you should see the opposite (-1 refresh_interval is faster
indexing throughput).
and created 20 different indexes by changing
translog,merge,refresh...etc while "number_of_shards:1,number_of_replica:0"
part is common.
I found that the best performed index was* actually **default
settings + refresh_interval:"10s"(not *refresh_interval:"-1") in
terms of doc count accuracy or load average or lesser bulk queue.
Can anyone explain why this settings is the best?
In addition,I can observe when refresh thread is active,the segments
count decrease.What is exactly the refresh thread doing?
Possibly it is IO bound but I don't seem too many io wait on Cpu or write
activity on iostat.By the way,uses ssd and xfs as file system and default
Directory ( I think it becomes MMapDirectory).
Local SSD (not e.g. Amazon's EBS backed by SSD)? Is this dedicated
hardware or virtual? Dedicated is better.
each single bulk request to one index is done concurrently 5X so you
only need enough concurrent bulk requests to saturate the number of CPUs
I suppose that IndexWriter will lock at some point but will this strategy
work on the same index?
Yes, for one index ES creates 5 shards by default, so a single bulk request
indexing N docs will effectively use 5 CPUs assuming docs are routed evenly.
However,setting index.merge.async_interval higher than default "1s"
seems better for the huge indexing (I'm still using 1.4.0).I found that it
was removed from recent release of 1.5.0.Do you know why?Will I see
better indexing performance just simply upgrade to >=1.5.0?
Please don't change that setting: it's a bad idea. By increasing it, you
are delaying when Lucene gets a chance to kick off segment merging.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.