Indexing is being throttled

bob_webman · September 18, 2014, 8:54am

Setup:
4 nodes
Replication = 0
ES_HEAP_SIZE = 75GB
Number of Indices = 59 (using logstash one index per month)
Total shards = 234 (each index is 4 hards, one per node)
Total docs = 7.4 billion
Total size = 4.7TB

When I add a new file, which I do using logstash on all four nodes, the
indexing immediately throttles. For instance:

[2014-09-18 09:41:42,326][INFO ][index.engine.internal ] [hdp13] [
logstash-2014.09][2] stop throttling indexing: numMergesInFlight=4,
maxNumMerges=5
[2014-09-18 09:41:45,267][INFO ][index.engine.internal ] [hdp13]
[logstash-2014.09][2] now throttling indexing: numMergesInFlight=6,
maxNumMerges=5
[2014-09-18 09:41:45,303][INFO ][index.engine.internal ] [hdp13]
[logstash-2014.09][2] stop throttling indexing: numMergesInFlight=4,
maxNumMerges=5
[2014-09-18 09:41:51,273][INFO ][index.engine.internal ] [hdp13]
[logstash-2014.09][2] now throttling indexing: numMergesInFlight=6,
maxNumMerges=5
[2014-09-18 09:41:51,379][INFO ][index.engine.internal ] [hdp13]
[logstash-2014.09][2] stop throttling indexing: numMergesInFlight=4,
maxNumMerges=5
[2014-09-18 09:42:06,429][INFO ][index.engine.internal ] [hdp13]
[logstash-2014.09][2] now t

Where should I be looking to tuning the indexing performance? The query
load on the cluster is very low as it is a research cluster and so I would
sacrifice query performance for indexing.

The 4 nodes all run logstash, listening one various ports. I use netcat to
'feed' the data to the 4 nodes from a hadoop cluster.

hadoop1 netcat -------->
hadoop2 netcat --------> ES1
hadoop3 netcat -------->

And so on.

Each ES node has 24 disks but I am only using one at the moment. This is an
obvious IO bottleneck, but I am unclear how to use all disks? If I add more
disks with ES share the data between them all? eg; /mnt/disk1 /mnt/disk2 etc

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3e85d65c-8001-4f90-bfa0-f7e63679feba%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

mikemccand · September 18, 2014, 8:57am

Try disabling merge IO throttling, especially if your index is on SSD/s.
(It's on by default at a paltry 20 MB/sec). Merge IO throttling causes
merges to run slowly which eventually causes them to back up enough to the
point where indexing must be throttled...

Also see the recent post about tuning to favor indexing throughput:

Mike McCandless

http://blog.mikemccandless.com

On Thu, Sep 18, 2014 at 4:54 AM, bob.webman@gmail.com wrote:

Setup:
4 nodes
Replication = 0
ES_HEAP_SIZE = 75GB
Number of Indices = 59 (using logstash one index per month)
Total shards = 234 (each index is 4 hards, one per node)
Total docs = 7.4 billion
Total size = 4.7TB

When I add a new file, which I do using logstash on all four nodes, the
indexing immediately throttles. For instance:

[2014-09-18 09:41:42,326][INFO ][index.engine.internal ] [hdp13] [
logstash-2014.09][2] stop throttling indexing: numMergesInFlight=4,
maxNumMerges=5
[2014-09-18 09:41:45,267][INFO ][index.engine.internal ] [hdp13]
[logstash-2014.09][2] now throttling indexing: numMergesInFlight=6,
maxNumMerges=5
[2014-09-18 09:41:45,303][INFO ][index.engine.internal ] [hdp13]
[logstash-2014.09][2] stop throttling indexing: numMergesInFlight=4,
maxNumMerges=5
[2014-09-18 09:41:51,273][INFO ][index.engine.internal ] [hdp13]
[logstash-2014.09][2] now throttling indexing: numMergesInFlight=6,
maxNumMerges=5
[2014-09-18 09:41:51,379][INFO ][index.engine.internal ] [hdp13]
[logstash-2014.09][2] stop throttling indexing: numMergesInFlight=4,
maxNumMerges=5
[2014-09-18 09:42:06,429][INFO ][index.engine.internal ] [hdp13]
[logstash-2014.09][2] now t

Where should I be looking to tuning the indexing performance? The query
load on the cluster is very low as it is a research cluster and so I would
sacrifice query performance for indexing.

The 4 nodes all run logstash, listening one various ports. I use netcat to
'feed' the data to the 4 nodes from a hadoop cluster.

hadoop1 netcat -------->
hadoop2 netcat --------> ES1
hadoop3 netcat -------->

And so on.

Each ES node has 24 disks but I am only using one at the moment. This is
an obvious IO bottleneck, but I am unclear how to use all disks? If I add
more disks with ES share the data between them all? eg; /mnt/disk1
/mnt/disk2 etc

Thanks

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3e85d65c-8001-4f90-bfa0-f7e63679feba%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3e85d65c-8001-4f90-bfa0-f7e63679feba%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRdJwXcsq%2BdUpyMZ%3D2UZsDbGwX7CEeE91L_rFan1FP6bDw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · September 18, 2014, 9:02am

Also given you're over 32GB heap your java pointers aren't going to be
compressed, which means GC will suffer.

You haven't mentioned what ES and java versions you are using, which would
be useful.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 18 September 2014 18:57, Michael McCandless mike@elasticsearch.com
wrote:

Try disabling merge IO throttling, especially if your index is on SSD/s.
(It's on by default at a paltry 20 MB/sec). Merge IO throttling causes
merges to run slowly which eventually causes them to back up enough to the
point where indexing must be throttled...

Also see the recent post about tuning to favor indexing throughput:
Elasticsearch Platform — Find real-time answers at scale | Elastic

Mike McCandless

http://blog.mikemccandless.com

On Thu, Sep 18, 2014 at 4:54 AM, bob.webman@gmail.com wrote:

Setup:
4 nodes
Replication = 0
ES_HEAP_SIZE = 75GB
Number of Indices = 59 (using logstash one index per month)
Total shards = 234 (each index is 4 hards, one per node)
Total docs = 7.4 billion
Total size = 4.7TB

When I add a new file, which I do using logstash on all four nodes, the
indexing immediately throttles. For instance:

[2014-09-18 09:41:42,326][INFO ][index.engine.internal ] [hdp13] [
logstash-2014.09][2] stop throttling indexing: numMergesInFlight=4,
maxNumMerges=5
[2014-09-18 09:41:45,267][INFO ][index.engine.internal ] [hdp13]
[logstash-2014.09][2] now throttling indexing: numMergesInFlight=6,
maxNumMerges=5
[2014-09-18 09:41:45,303][INFO ][index.engine.internal ] [hdp13]
[logstash-2014.09][2] stop throttling indexing: numMergesInFlight=4,
maxNumMerges=5
[2014-09-18 09:41:51,273][INFO ][index.engine.internal ] [hdp13]
[logstash-2014.09][2] now throttling indexing: numMergesInFlight=6,
maxNumMerges=5
[2014-09-18 09:41:51,379][INFO ][index.engine.internal ] [hdp13]
[logstash-2014.09][2] stop throttling indexing: numMergesInFlight=4,
maxNumMerges=5
[2014-09-18 09:42:06,429][INFO ][index.engine.internal ] [hdp13]
[logstash-2014.09][2] now t

Where should I be looking to tuning the indexing performance? The query
load on the cluster is very low as it is a research cluster and so I would
sacrifice query performance for indexing.

The 4 nodes all run logstash, listening one various ports. I use netcat
to 'feed' the data to the 4 nodes from a hadoop cluster.

hadoop1 netcat -------->
hadoop2 netcat --------> ES1
hadoop3 netcat -------->

And so on.

Each ES node has 24 disks but I am only using one at the moment. This is
an obvious IO bottleneck, but I am unclear how to use all disks? If I add
more disks with ES share the data between them all? eg; /mnt/disk1
/mnt/disk2 etc

Thanks

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3e85d65c-8001-4f90-bfa0-f7e63679feba%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3e85d65c-8001-4f90-bfa0-f7e63679feba%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAD7smRdJwXcsq%2BdUpyMZ%3D2UZsDbGwX7CEeE91L_rFan1FP6bDw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAD7smRdJwXcsq%2BdUpyMZ%3D2UZsDbGwX7CEeE91L_rFan1FP6bDw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ZdMj%2BWgJoTkCp7cfXcrhe24WA5TEdJzn%2BFymYm6b3ejQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

bob_webman · September 18, 2014, 9:30am

Good point on heap, so I will bring that back down to 30GB

Versions:
ES 1.3.2-1
java 1.7.0_67

I definitely want to start using all 12 disks, rather than the 1 at the
moment! If I add paths for the other 11 disks and restart, will ES do any
'rebalancing'? If it won't then is there any way to move the data around
all 12 disks? I really don't want to re-index everthing!!

Thanks

On Thursday, September 18, 2014 10:03:18 AM UTC+1, Mark Walkom wrote:

Also given you're over 32GB heap your java pointers aren't going to be
compressed, which means GC will suffer.

You haven't mentioned what ES and java versions you are using, which would
be useful.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 18 September 2014 18:57, Michael McCandless <mi...@elasticsearch.com
<javascript:>> wrote:

Try disabling merge IO throttling, especially if your index is on SSD/s.
(It's on by default at a paltry 20 MB/sec). Merge IO throttling causes
merges to run slowly which eventually causes them to back up enough to the
point where indexing must be throttled...

Also see the recent post about tuning to favor indexing throughput:
Elasticsearch Platform — Find real-time answers at scale | Elastic

Mike McCandless

http://blog.mikemccandless.com

On Thu, Sep 18, 2014 at 4:54 AM, <bob.w...@gmail.com <javascript:>>
wrote:

Setup:
4 nodes
Replication = 0
ES_HEAP_SIZE = 75GB
Number of Indices = 59 (using logstash one index per month)
Total shards = 234 (each index is 4 hards, one per node)
Total docs = 7.4 billion
Total size = 4.7TB

When I add a new file, which I do using logstash on all four nodes, the
indexing immediately throttles. For instance:

[2014-09-18 09:41:42,326][INFO ][index.engine.internal ] [hdp13] [
logstash-2014.09][2] stop throttling indexing: numMergesInFlight=4,
maxNumMerges=5
[2014-09-18 09:41:45,267][INFO ][index.engine.internal ] [hdp13]
[logstash-2014.09][2] now throttling indexing: numMergesInFlight=6,
maxNumMerges=5
[2014-09-18 09:41:45,303][INFO ][index.engine.internal ] [hdp13]
[logstash-2014.09][2] stop throttling indexing: numMergesInFlight=4,
maxNumMerges=5
[2014-09-18 09:41:51,273][INFO ][index.engine.internal ] [hdp13]
[logstash-2014.09][2] now throttling indexing: numMergesInFlight=6,
maxNumMerges=5
[2014-09-18 09:41:51,379][INFO ][index.engine.internal ] [hdp13]
[logstash-2014.09][2] stop throttling indexing: numMergesInFlight=4,
maxNumMerges=5
[2014-09-18 09:42:06,429][INFO ][index.engine.internal ] [hdp13]
[logstash-2014.09][2] now t

Where should I be looking to tuning the indexing performance? The query
load on the cluster is very low as it is a research cluster and so I would
sacrifice query performance for indexing.

The 4 nodes all run logstash, listening one various ports. I use netcat
to 'feed' the data to the 4 nodes from a hadoop cluster.

hadoop1 netcat -------->
hadoop2 netcat --------> ES1
hadoop3 netcat -------->

And so on.

Each ES node has 24 disks but I am only using one at the moment. This is
an obvious IO bottleneck, but I am unclear how to use all disks? If I add
more disks with ES share the data between them all? eg; /mnt/disk1
/mnt/disk2 etc

Thanks

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3e85d65c-8001-4f90-bfa0-f7e63679feba%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3e85d65c-8001-4f90-bfa0-f7e63679feba%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAD7smRdJwXcsq%2BdUpyMZ%3D2UZsDbGwX7CEeE91L_rFan1FP6bDw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAD7smRdJwXcsq%2BdUpyMZ%3D2UZsDbGwX7CEeE91L_rFan1FP6bDw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2c8d4764-954e-4f13-8b03-89afa2a2d573%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · September 18, 2014, 9:34am

Does your server have hardware RAID capabilities?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 18 September 2014 19:30, bob.webman@gmail.com wrote:

Good point on heap, so I will bring that back down to 30GB

Versions:
ES 1.3.2-1
java 1.7.0_67

I definitely want to start using all 12 disks, rather than the 1 at the
moment! If I add paths for the other 11 disks and restart, will ES do any
'rebalancing'? If it won't then is there any way to move the data around
all 12 disks? I really don't want to re-index everthing!!

Thanks

On Thursday, September 18, 2014 10:03:18 AM UTC+1, Mark Walkom wrote:

Also given you're over 32GB heap your java pointers aren't going to be
compressed, which means GC will suffer.

You haven't mentioned what ES and java versions you are using, which
would be useful.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 18 September 2014 18:57, Michael McCandless mi...@elasticsearch.com
wrote:

Try disabling merge IO throttling, especially if your index is on SSD/s.
(It's on by default at a paltry 20 MB/sec). Merge IO throttling causes
merges to run slowly which eventually causes them to back up enough to the
point where indexing must be throttled...

Also see the recent post about tuning to favor indexing throughput:
Elasticsearch Platform — Find real-time answers at scale | Elastic
elasticsearch-indexing/

Mike McCandless

http://blog.mikemccandless.com

On Thu, Sep 18, 2014 at 4:54 AM, bob.w...@gmail.com wrote:

Setup:
4 nodes
Replication = 0
ES_HEAP_SIZE = 75GB
Number of Indices = 59 (using logstash one index per month)
Total shards = 234 (each index is 4 hards, one per node)
Total docs = 7.4 billion
Total size = 4.7TB

When I add a new file, which I do using logstash on all four nodes, the
indexing immediately throttles. For instance:

[2014-09-18 09:41:42,326][INFO ][index.engine.internal ] [hdp13] [
logstash-2014.09][2] stop throttling indexing: numMergesInFlight=4,
maxNumMerges=5
[2014-09-18 09:41:45,267][INFO ][index.engine.internal ] [hdp13]
[logstash-2014.09][2] now throttling indexing: numMergesInFlight=6,
maxNumMerges=5
[2014-09-18 09:41:45,303][INFO ][index.engine.internal ] [hdp13]
[logstash-2014.09][2] stop throttling indexing: numMergesInFlight=4,
maxNumMerges=5
[2014-09-18 09:41:51,273][INFO ][index.engine.internal ] [hdp13]
[logstash-2014.09][2] now throttling indexing: numMergesInFlight=6,
maxNumMerges=5
[2014-09-18 09:41:51,379][INFO ][index.engine.internal ] [hdp13]
[logstash-2014.09][2] stop throttling indexing: numMergesInFlight=4,
maxNumMerges=5
[2014-09-18 09:42:06,429][INFO ][index.engine.internal ] [hdp13]
[logstash-2014.09][2] now t

Where should I be looking to tuning the indexing performance? The query
load on the cluster is very low as it is a research cluster and so I would
sacrifice query performance for indexing.

The 4 nodes all run logstash, listening one various ports. I use netcat
to 'feed' the data to the 4 nodes from a hadoop cluster.

hadoop1 netcat -------->
hadoop2 netcat --------> ES1
hadoop3 netcat -------->

And so on.

Each ES node has 24 disks but I am only using one at the moment. This
is an obvious IO bottleneck, but I am unclear how to use all disks? If I
add more disks with ES share the data between them all? eg; /mnt/disk1
/mnt/disk2 etc

Thanks

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/3e85d65c-8001-4f90-bfa0-f7e63679feba%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3e85d65c-8001-4f90-bfa0-f7e63679feba%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/CAD7smRdJwXcsq%2BdUpyMZ%3D2UZsDbGwX7CEeE91L_
rFan1FP6bDw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAD7smRdJwXcsq%2BdUpyMZ%3D2UZsDbGwX7CEeE91L_rFan1FP6bDw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2c8d4764-954e-4f13-8b03-89afa2a2d573%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2c8d4764-954e-4f13-8b03-89afa2a2d573%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624Y3Kn8pexvrFCMOK_B9mvM-T3fWdQ-jEh01qwSk-4zuUw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

bob_webman · September 18, 2014, 2:11pm

Unfortunately that is too hard/complicated.

I have now enabled all 12 disks per machine, so going forward I will get
some "sharing" across all disks. Not sure how it will allocate new data
across the disks?

If I move a shard from one node to another with the new 12-disk paths, will
the receiving node "share" the data across the disks? That way I could move
all shards and get a redistribution of existing data?

On Thursday, September 18, 2014 10:35:24 AM UTC+1, Mark Walkom wrote:

Does your server have hardware RAID capabilities?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 18 September 2014 19:30, <bob.w...@gmail.com <javascript:>> wrote:

Good point on heap, so I will bring that back down to 30GB

Versions:
ES 1.3.2-1
java 1.7.0_67

I definitely want to start using all 12 disks, rather than the 1 at the
moment! If I add paths for the other 11 disks and restart, will ES do any
'rebalancing'? If it won't then is there any way to move the data around
all 12 disks? I really don't want to re-index everthing!!

Thanks

On Thursday, September 18, 2014 10:03:18 AM UTC+1, Mark Walkom wrote:

Also given you're over 32GB heap your java pointers aren't going to be
compressed, which means GC will suffer.

You haven't mentioned what ES and java versions you are using, which
would be useful.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 18 September 2014 18:57, Michael McCandless mi...@elasticsearch.com
wrote:

Try disabling merge IO throttling, especially if your index is on
SSD/s. (It's on by default at a paltry 20 MB/sec). Merge IO throttling
causes merges to run slowly which eventually causes them to back up enough
to the point where indexing must be throttled...

Also see the recent post about tuning to favor indexing throughput:
Elasticsearch Platform — Find real-time answers at scale | Elastic
elasticsearch-indexing/

Mike McCandless

http://blog.mikemccandless.com

On Thu, Sep 18, 2014 at 4:54 AM, bob.w...@gmail.com wrote:

Setup:
4 nodes
Replication = 0
ES_HEAP_SIZE = 75GB
Number of Indices = 59 (using logstash one index per month)
Total shards = 234 (each index is 4 hards, one per node)
Total docs = 7.4 billion
Total size = 4.7TB

When I add a new file, which I do using logstash on all four nodes,
the indexing immediately throttles. For instance:

[2014-09-18 09:41:42,326][INFO ][index.engine.internal ] [hdp13] [
logstash-2014.09][2] stop throttling indexing: numMergesInFlight=4,
maxNumMerges=5
[2014-09-18 09:41:45,267][INFO ][index.engine.internal ] [hdp13]
[logstash-2014.09][2] now throttling indexing: numMergesInFlight=6,
maxNumMerges=5
[2014-09-18 09:41:45,303][INFO ][index.engine.internal ] [hdp13]
[logstash-2014.09][2] stop throttling indexing: numMergesInFlight=4,
maxNumMerges=5
[2014-09-18 09:41:51,273][INFO ][index.engine.internal ] [hdp13]
[logstash-2014.09][2] now throttling indexing: numMergesInFlight=6,
maxNumMerges=5
[2014-09-18 09:41:51,379][INFO ][index.engine.internal ] [hdp13]
[logstash-2014.09][2] stop throttling indexing: numMergesInFlight=4,
maxNumMerges=5
[2014-09-18 09:42:06,429][INFO ][index.engine.internal ] [hdp13]
[logstash-2014.09][2] now t

Where should I be looking to tuning the indexing performance? The
query load on the cluster is very low as it is a research cluster and so I
would sacrifice query performance for indexing.

The 4 nodes all run logstash, listening one various ports. I use
netcat to 'feed' the data to the 4 nodes from a hadoop cluster.

hadoop1 netcat -------->
hadoop2 netcat --------> ES1
hadoop3 netcat -------->

And so on.

Each ES node has 24 disks but I am only using one at the moment. This
is an obvious IO bottleneck, but I am unclear how to use all disks? If I
add more disks with ES share the data between them all? eg; /mnt/disk1
/mnt/disk2 etc

Thanks

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/3e85d65c-8001-4f90-bfa0-f7e63679feba%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3e85d65c-8001-4f90-bfa0-f7e63679feba%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/CAD7smRdJwXcsq%2BdUpyMZ%3D2UZsDbGwX7CEeE91L_
rFan1FP6bDw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAD7smRdJwXcsq%2BdUpyMZ%3D2UZsDbGwX7CEeE91L_rFan1FP6bDw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2c8d4764-954e-4f13-8b03-89afa2a2d573%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2c8d4764-954e-4f13-8b03-89afa2a2d573%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1eeb45aa-6957-4046-ae33-00fc4a7df015%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · September 18, 2014, 9:28pm

You'd get a much greater benefit from RAID than you will by using all disks
as individuals.

You can however use multiple mountpoints to store ES data it's just an
array in path.data.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 19 September 2014 00:11, bob.webman@gmail.com wrote:

Unfortunately that is too hard/complicated.

I have now enabled all 12 disks per machine, so going forward I will get
some "sharing" across all disks. Not sure how it will allocate new data
across the disks?

If I move a shard from one node to another with the new 12-disk paths,
will the receiving node "share" the data across the disks? That way I could
move all shards and get a redistribution of existing data?

On Thursday, September 18, 2014 10:35:24 AM UTC+1, Mark Walkom wrote:

Does your server have hardware RAID capabilities?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 18 September 2014 19:30, bob.w...@gmail.com wrote:

Good point on heap, so I will bring that back down to 30GB

Versions:
ES 1.3.2-1
java 1.7.0_67

I definitely want to start using all 12 disks, rather than the 1 at the
moment! If I add paths for the other 11 disks and restart, will ES do any
'rebalancing'? If it won't then is there any way to move the data around
all 12 disks? I really don't want to re-index everthing!!

Thanks

On Thursday, September 18, 2014 10:03:18 AM UTC+1, Mark Walkom wrote:

Also given you're over 32GB heap your java pointers aren't going to be
compressed, which means GC will suffer.

You haven't mentioned what ES and java versions you are using, which
would be useful.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 18 September 2014 18:57, Michael McCandless <mi...@elasticsearch.com

wrote:

Try disabling merge IO throttling, especially if your index is on
SSD/s. (It's on by default at a paltry 20 MB/sec). Merge IO throttling
causes merges to run slowly which eventually causes them to back up enough
to the point where indexing must be throttled...

Also see the recent post about tuning to favor indexing throughput:
Elasticsearch Platform — Find real-time answers at scale | Elastic
elasticsearch-indexing/

Mike McCandless

http://blog.mikemccandless.com

On Thu, Sep 18, 2014 at 4:54 AM, bob.w...@gmail.com wrote:

Setup:
4 nodes
Replication = 0
ES_HEAP_SIZE = 75GB
Number of Indices = 59 (using logstash one index per month)
Total shards = 234 (each index is 4 hards, one per node)
Total docs = 7.4 billion
Total size = 4.7TB

When I add a new file, which I do using logstash on all four nodes,
the indexing immediately throttles. For instance:

[2014-09-18 09:41:42,326][INFO ][index.engine.internal ] [hdp13] [
logstash-2014.09][2] stop throttling indexing: numMergesInFlight=4,
maxNumMerges=5
[2014-09-18 09:41:45,267][INFO ][index.engine.internal ] [hdp13]
[logstash-2014.09][2] now throttling indexing: numMergesInFlight=6,
maxNumMerges=5
[2014-09-18 09:41:45,303][INFO ][index.engine.internal ] [hdp13]
[logstash-2014.09][2] stop throttling indexing: numMergesInFlight=4,
maxNumMerges=5
[2014-09-18 09:41:51,273][INFO ][index.engine.internal ] [hdp13]
[logstash-2014.09][2] now throttling indexing: numMergesInFlight=6,
maxNumMerges=5
[2014-09-18 09:41:51,379][INFO ][index.engine.internal ] [hdp13]
[logstash-2014.09][2] stop throttling indexing: numMergesInFlight=4,
maxNumMerges=5
[2014-09-18 09:42:06,429][INFO ][index.engine.internal ] [hdp13]
[logstash-2014.09][2] now t

Where should I be looking to tuning the indexing performance? The
query load on the cluster is very low as it is a research cluster and so I
would sacrifice query performance for indexing.

The 4 nodes all run logstash, listening one various ports. I use
netcat to 'feed' the data to the 4 nodes from a hadoop cluster.

hadoop1 netcat -------->
hadoop2 netcat --------> ES1
hadoop3 netcat -------->

And so on.

Each ES node has 24 disks but I am only using one at the moment. This
is an obvious IO bottleneck, but I am unclear how to use all disks? If I
add more disks with ES share the data between them all? eg; /mnt/disk1
/mnt/disk2 etc

Thanks

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/3e85d65c-8001-4f90-bfa0-f7e63679feba%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/3e85d65c-8001-4f90-bfa0-f7e63679feba%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/CAD7smRdJwXcsq%2BdUpyMZ%3D2UZsDbGwX7CEeE91L_
rFan1FP6bDw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAD7smRdJwXcsq%2BdUpyMZ%3D2UZsDbGwX7CEeE91L_rFan1FP6bDw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/2c8d4764-954e-4f13-8b03-89afa2a2d573%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2c8d4764-954e-4f13-8b03-89afa2a2d573%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1eeb45aa-6957-4046-ae33-00fc4a7df015%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/1eeb45aa-6957-4046-ae33-00fc4a7df015%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624bWySbFQOshtOOo%3D2%2BNX_mvOq7%2BTEMNF9n5MMgEQ3qnog%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Slow down/throttle indexing Elasticsearch	5	763	July 5, 2017
How to stop index from getting throttled? Elasticsearch	19	1334	July 5, 2023
Indexing throttling in Elasticsearch Elasticsearch	4	8870	October 23, 2017
Merge throttling is preventing heavy bulk indexing (ES 1.7.5) Elasticsearch	5	1940	July 5, 2017
What's limiting my Elasticsearch? Elasticsearch	19	3571	July 5, 2017

Indexing is being throttled

Related topics