Garbage collection pauses causing cluster to get unresponsive


(Srinath C) #1

Hi,
I'm having a tough time to keep ElasticSearch running healthily for even
20-30 mins in my setup. At an indexing rate of 28-36K per second, the CPU
utilization soon drops to 100% and never recovers. All client requests fail
with UnavailbleShardException or "No Nodes" exception. The logs show
warnings from "monitor.jvm" saying that GC did not free up much of memory.

The ultimate requirement is to import data into the ES cluster at around
60K per second on a setup explained below. The only operation being
performed is bulk import of documents. Soon the ES nodes become
unresponsive and the CPU utilization drops to 100% (from 400-500%). They
don't seem to recover even after the bulk import operations are ceased.

Any suggestions on how to tune the GC based on my requirements? What
other information would be needed to look into this?

Thanks,
Srinath.

The setup:

  • Cluster: a 4 node cluster of c3.2xlarge instances on aws-ec2.
  • Load: The only operation during this test is bulk import of data. The
    documents are small around the size of ~200-500 bytes and are being bulk
    imported into the cluster using storm.
  • Bulk Import: A total of 7-9 storm workers using a single BulkProcessor
    each to import data into the ES cluster. As seen from the logs, each of the
    worker processes are importing around 4K docs per second from each worker
    i.e. around 28-36K docs per second getting imported into ES.
  • JVM Args: Around 8G of heap, tried with CMS collector as well as G1
    collector
  • ES configuration:
    • "mlockall": true
    • "threadpool.bulk.size": 20
    • "threadpool.bulk.queue_size": 500
    • "indices.memory.index_buffer_size": "50%"
    • "index.refresh_interval": "30s"
    • "index.merge.policy.segments_per_tier": 100

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHhx-GLy1mftUtFT6eyrnuzcpNTu%3DDt3maj3YnuEdYKP4NaYWA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #2

How much data does that all come out to be, GB and doc count?
How many indexes and how many shards per index?

60k/s is pretty high volume!

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 14 July 2014 12:46, Srinath C srinath.c@gmail.com wrote:

Hi,
I'm having a tough time to keep ElasticSearch running healthily for
even 20-30 mins in my setup. At an indexing rate of 28-36K per second, the
CPU utilization soon drops to 100% and never recovers. All client requests
fail with UnavailbleShardException or "No Nodes" exception. The logs show
warnings from "monitor.jvm" saying that GC did not free up much of memory.

The ultimate requirement is to import data into the ES cluster at around
60K per second on a setup explained below. The only operation being
performed is bulk import of documents. Soon the ES nodes become
unresponsive and the CPU utilization drops to 100% (from 400-500%). They
don't seem to recover even after the bulk import operations are ceased.

Any suggestions on how to tune the GC based on my requirements? What
other information would be needed to look into this?

Thanks,
Srinath.

The setup:

  • Cluster: a 4 node cluster of c3.2xlarge instances on aws-ec2.
  • Load: The only operation during this test is bulk import of data. The
    documents are small around the size of ~200-500 bytes and are being bulk
    imported into the cluster using storm.
  • Bulk Import: A total of 7-9 storm workers using a single BulkProcessor
    each to import data into the ES cluster. As seen from the logs, each of the
    worker processes are importing around 4K docs per second from each worker
    i.e. around 28-36K docs per second getting imported into ES.
  • JVM Args: Around 8G of heap, tried with CMS collector as well as G1
    collector
  • ES configuration:
    • "mlockall": true
    • "threadpool.bulk.size": 20
    • "threadpool.bulk.queue_size": 500
    • "indices.memory.index_buffer_size": "50%"
    • "index.refresh_interval": "30s"
    • "index.merge.policy.segments_per_tier": 100

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GLy1mftUtFT6eyrnuzcpNTu%3DDt3maj3YnuEdYKP4NaYWA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GLy1mftUtFT6eyrnuzcpNTu%3DDt3maj3YnuEdYKP4NaYWA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624Y5Fq78Z4ca1jO%3DjaA3owHAEGU76k3QT5UxTNrW8By9-A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #3

index.merge.policy.segments_per_tier: 100 and threadpool.bulk.queue_size:
500 are extreme settings that should be avoided as they allocate much
resources. What you see by UnavailbleShardException / NoNodes is congestion
because of such extreme values.

What ES version is this? Why don't you use the default settings?

Jörg

On Mon, Jul 14, 2014 at 4:46 AM, Srinath C srinath.c@gmail.com wrote:

Hi,
I'm having a tough time to keep ElasticSearch running healthily for
even 20-30 mins in my setup. At an indexing rate of 28-36K per second, the
CPU utilization soon drops to 100% and never recovers. All client requests
fail with UnavailbleShardException or "No Nodes" exception. The logs show
warnings from "monitor.jvm" saying that GC did not free up much of memory.

The ultimate requirement is to import data into the ES cluster at around
60K per second on a setup explained below. The only operation being
performed is bulk import of documents. Soon the ES nodes become
unresponsive and the CPU utilization drops to 100% (from 400-500%). They
don't seem to recover even after the bulk import operations are ceased.

Any suggestions on how to tune the GC based on my requirements? What
other information would be needed to look into this?

Thanks,
Srinath.

The setup:

  • Cluster: a 4 node cluster of c3.2xlarge instances on aws-ec2.
  • Load: The only operation during this test is bulk import of data. The
    documents are small around the size of ~200-500 bytes and are being bulk
    imported into the cluster using storm.
  • Bulk Import: A total of 7-9 storm workers using a single BulkProcessor
    each to import data into the ES cluster. As seen from the logs, each of the
    worker processes are importing around 4K docs per second from each worker
    i.e. around 28-36K docs per second getting imported into ES.
  • JVM Args: Around 8G of heap, tried with CMS collector as well as G1
    collector
  • ES configuration:
    • "mlockall": true
    • "threadpool.bulk.size": 20
    • "threadpool.bulk.queue_size": 500
    • "indices.memory.index_buffer_size": "50%"
    • "index.refresh_interval": "30s"
    • "index.merge.policy.segments_per_tier": 100

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GLy1mftUtFT6eyrnuzcpNTu%3DDt3maj3YnuEdYKP4NaYWA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GLy1mftUtFT6eyrnuzcpNTu%3DDt3maj3YnuEdYKP4NaYWA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFw8OCT458Ud7j-3Z4eHi2DbixJA6S2g%2BE5z9jQe7KFMg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Srinath C) #4

Each document is around 300 bytes on average so that bring up the data rate
to around 17Mb per sec.
This is running on ES version 1.1.1. I have been trying out different
values for these configurations. queue_size was increased when I got
EsRejectedException due to queue going full (default size of 50).
segments_per_tier was picked up from some articles on scaling. What would
be a reasonable value based on my data rate?

If 60K seems to be too high are there any benchmarks available for
ElasticSearch?

Thanks all for your replies.

On Monday, 14 July 2014 15:25:13 UTC+5:30, Jörg Prante wrote:

index.merge.policy.segments_per_tier: 100 and threadpool.bulk.queue_size:
500 are extreme settings that should be avoided as they allocate much
resources. What you see by UnavailbleShardException / NoNodes is
congestion because of such extreme values.

What ES version is this? Why don't you use the default settings?

Jörg

On Mon, Jul 14, 2014 at 4:46 AM, Srinath C <srin...@gmail.com
<javascript:>> wrote:

Hi,
I'm having a tough time to keep ElasticSearch running healthily for
even 20-30 mins in my setup. At an indexing rate of 28-36K per second, the
CPU utilization soon drops to 100% and never recovers. All client requests
fail with UnavailbleShardException or "No Nodes" exception. The logs show
warnings from "monitor.jvm" saying that GC did not free up much of memory.

The ultimate requirement is to import data into the ES cluster at around
60K per second on a setup explained below. The only operation being
performed is bulk import of documents. Soon the ES nodes become
unresponsive and the CPU utilization drops to 100% (from 400-500%). They
don't seem to recover even after the bulk import operations are ceased.

Any suggestions on how to tune the GC based on my requirements? What
other information would be needed to look into this?

Thanks,
Srinath.

The setup:

  • Cluster: a 4 node cluster of c3.2xlarge instances on aws-ec2.
  • Load: The only operation during this test is bulk import of data. The
    documents are small around the size of ~200-500 bytes and are being bulk
    imported into the cluster using storm.
  • Bulk Import: A total of 7-9 storm workers using a single
    BulkProcessor each to import data into the ES cluster. As seen from the
    logs, each of the worker processes are importing around 4K docs per second
    from each worker i.e. around 28-36K docs per second getting imported into
    ES.
  • JVM Args: Around 8G of heap, tried with CMS collector as well as G1
    collector
  • ES configuration:
    • "mlockall": true
    • "threadpool.bulk.size": 20
    • "threadpool.bulk.queue_size": 500
    • "indices.memory.index_buffer_size": "50%"
    • "index.refresh_interval": "30s"
    • "index.merge.policy.segments_per_tier": 100

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GLy1mftUtFT6eyrnuzcpNTu%3DDt3maj3YnuEdYKP4NaYWA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GLy1mftUtFT6eyrnuzcpNTu%3DDt3maj3YnuEdYKP4NaYWA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/615b3774-b3b1-4104-bd22-0a7e4d8b6d4e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Srinath C) #5

Hi Mark,
The total targetted data rate is around 17Mb per sec. The expected
number of indices are around 50 with each index having 10 shards.

Regards,
Srinath.

On Monday, 14 July 2014 09:00:32 UTC+5:30, Mark Walkom wrote:

How much data does that all come out to be, GB and doc count?
How many indexes and how many shards per index?

60k/s is pretty high volume!

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 14 July 2014 12:46, Srinath C <srin...@gmail.com <javascript:>> wrote:

Hi,
I'm having a tough time to keep ElasticSearch running healthily for
even 20-30 mins in my setup. At an indexing rate of 28-36K per second, the
CPU utilization soon drops to 100% and never recovers. All client requests
fail with UnavailbleShardException or "No Nodes" exception. The logs show
warnings from "monitor.jvm" saying that GC did not free up much of memory.

The ultimate requirement is to import data into the ES cluster at around
60K per second on a setup explained below. The only operation being
performed is bulk import of documents. Soon the ES nodes become
unresponsive and the CPU utilization drops to 100% (from 400-500%). They
don't seem to recover even after the bulk import operations are ceased.

Any suggestions on how to tune the GC based on my requirements? What
other information would be needed to look into this?

Thanks,
Srinath.

The setup:

  • Cluster: a 4 node cluster of c3.2xlarge instances on aws-ec2.
  • Load: The only operation during this test is bulk import of data. The
    documents are small around the size of ~200-500 bytes and are being bulk
    imported into the cluster using storm.
  • Bulk Import: A total of 7-9 storm workers using a single
    BulkProcessor each to import data into the ES cluster. As seen from the
    logs, each of the worker processes are importing around 4K docs per second
    from each worker i.e. around 28-36K docs per second getting imported into
    ES.
  • JVM Args: Around 8G of heap, tried with CMS collector as well as G1
    collector
  • ES configuration:
    • "mlockall": true
    • "threadpool.bulk.size": 20
    • "threadpool.bulk.queue_size": 500
    • "indices.memory.index_buffer_size": "50%"
    • "index.refresh_interval": "30s"
    • "index.merge.policy.segments_per_tier": 100

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GLy1mftUtFT6eyrnuzcpNTu%3DDt3maj3YnuEdYKP4NaYWA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GLy1mftUtFT6eyrnuzcpNTu%3DDt3maj3YnuEdYKP4NaYWA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/390b6512-01be-4caa-9d97-9046248c59b1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #6

A better approach than increasing queue_size like crazy, which hits the
server resources very hard, is to set up a reasonable bulk request length
(say 1k or 10k* 300bytes) and a bulk concurrency that is proportional to
available CPU (say 16) or network bandwidth (if you do not have CPU bound
indexing).

If a concurrency limit is exceeded, the client must wait for outstanding
responses from the cluster before continuing. See BulkProcessor class. By
doing this, you can avoid "rejected execution" / "no node" / "shards not
available" exceptions.

Another area to increase throughput is index store throttling, maybe you
have disabled it. Default is 20mb/sec which depends on your I/O drives.

Would love a pointer about the article about scaling with
index.merge.policy.segments_per_tier: 100. This is definitely not scaling
at all.

Use a value of 4 or 5 in combination with concurrent merge scheduling, plus
a small maximum segment size like 1g, and a fixed number of merger threads
of 4 so that merges can keep up with bulk indexing. See also

Jörg

On Tue, Jul 15, 2014 at 5:03 AM, Srinath C srinath.c@gmail.com wrote:

Each document is around 300 bytes on average so that bring up the data
rate to around 17Mb per sec.
This is running on ES version 1.1.1. I have been trying out different
values for these configurations. queue_size was increased when I got
EsRejectedException due to queue going full (default size of 50).
segments_per_tier was picked up from some articles on scaling. What would
be a reasonable value based on my data rate?

If 60K seems to be too high are there any benchmarks available for
ElasticSearch?

Thanks all for your replies.

On Monday, 14 July 2014 15:25:13 UTC+5:30, Jörg Prante wrote:

index.merge.policy.segments_per_tier: 100 and threadpool.bulk.queue_size:
500 are extreme settings that should be avoided as they allocate much
resources. What you see by UnavailbleShardException / NoNodes is
congestion because of such extreme values.

What ES version is this? Why don't you use the default settings?

Jörg

On Mon, Jul 14, 2014 at 4:46 AM, Srinath C srin...@gmail.com wrote:

Hi,
I'm having a tough time to keep ElasticSearch running healthily for
even 20-30 mins in my setup. At an indexing rate of 28-36K per second, the
CPU utilization soon drops to 100% and never recovers. All client requests
fail with UnavailbleShardException or "No Nodes" exception. The logs show
warnings from "monitor.jvm" saying that GC did not free up much of memory.

The ultimate requirement is to import data into the ES cluster at
around 60K per second on a setup explained below. The only operation being
performed is bulk import of documents. Soon the ES nodes become
unresponsive and the CPU utilization drops to 100% (from 400-500%). They
don't seem to recover even after the bulk import operations are ceased.

Any suggestions on how to tune the GC based on my requirements? What
other information would be needed to look into this?

Thanks,
Srinath.

The setup:

  • Cluster: a 4 node cluster of c3.2xlarge instances on aws-ec2.
  • Load: The only operation during this test is bulk import of data.
    The documents are small around the size of ~200-500 bytes and are being
    bulk imported into the cluster using storm.
  • Bulk Import: A total of 7-9 storm workers using a single
    BulkProcessor each to import data into the ES cluster. As seen from the
    logs, each of the worker processes are importing around 4K docs per second
    from each worker i.e. around 28-36K docs per second getting imported into
    ES.
  • JVM Args: Around 8G of heap, tried with CMS collector as well as G1
    collector
  • ES configuration:
    • "mlockall": true
    • "threadpool.bulk.size": 20
    • "threadpool.bulk.queue_size": 500
    • "indices.memory.index_buffer_size": "50%"
    • "index.refresh_interval": "30s"
    • "index.merge.policy.segments_per_tier": 100

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/CAHhx-GLy1mftUtFT6eyrnuzcpNTu%
3DDt3maj3YnuEdYKP4NaYWA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GLy1mftUtFT6eyrnuzcpNTu%3DDt3maj3YnuEdYKP4NaYWA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/615b3774-b3b1-4104-bd22-0a7e4d8b6d4e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/615b3774-b3b1-4104-bd22-0a7e4d8b6d4e%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEn-RhzfVQrP1kdLpBfaOkcYBpXJkpAhaeNd-wpET6SWg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Michael McCandless) #7

First off, upgrade ES to the latest (1.2.2) release; there have been a
number of bulk indexing improvements since 1.1.

Second, disable merge IO throttling.

Third, use the default settings, but increase index.refresh_interval to
perhaps 5s, and set index.translog.flush_threshold_ops to maybe 50000: this
decreases the frequency of Lucene level commits (= filesystem fsyncs).

If possible, use SSDs: they are much faster for merging.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jul 14, 2014 at 11:03 PM, Srinath C srinath.c@gmail.com wrote:

Each document is around 300 bytes on average so that bring up the data
rate to around 17Mb per sec.
This is running on ES version 1.1.1. I have been trying out different
values for these configurations. queue_size was increased when I got
EsRejectedException due to queue going full (default size of 50).
segments_per_tier was picked up from some articles on scaling. What would
be a reasonable value based on my data rate?

If 60K seems to be too high are there any benchmarks available for
ElasticSearch?

Thanks all for your replies.

On Monday, 14 July 2014 15:25:13 UTC+5:30, Jörg Prante wrote:

index.merge.policy.segments_per_tier: 100 and threadpool.bulk.queue_size:
500 are extreme settings that should be avoided as they allocate much
resources. What you see by UnavailbleShardException / NoNodes is
congestion because of such extreme values.

What ES version is this? Why don't you use the default settings?

Jörg

On Mon, Jul 14, 2014 at 4:46 AM, Srinath C srin...@gmail.com wrote:

Hi,
I'm having a tough time to keep ElasticSearch running healthily for
even 20-30 mins in my setup. At an indexing rate of 28-36K per second, the
CPU utilization soon drops to 100% and never recovers. All client requests
fail with UnavailbleShardException or "No Nodes" exception. The logs show
warnings from "monitor.jvm" saying that GC did not free up much of memory.

The ultimate requirement is to import data into the ES cluster at
around 60K per second on a setup explained below. The only operation being
performed is bulk import of documents. Soon the ES nodes become
unresponsive and the CPU utilization drops to 100% (from 400-500%). They
don't seem to recover even after the bulk import operations are ceased.

Any suggestions on how to tune the GC based on my requirements? What
other information would be needed to look into this?

Thanks,
Srinath.

The setup:

  • Cluster: a 4 node cluster of c3.2xlarge instances on aws-ec2.
  • Load: The only operation during this test is bulk import of data.
    The documents are small around the size of ~200-500 bytes and are being
    bulk imported into the cluster using storm.
  • Bulk Import: A total of 7-9 storm workers using a single
    BulkProcessor each to import data into the ES cluster. As seen from the
    logs, each of the worker processes are importing around 4K docs per second
    from each worker i.e. around 28-36K docs per second getting imported into
    ES.
  • JVM Args: Around 8G of heap, tried with CMS collector as well as G1
    collector
  • ES configuration:
    • "mlockall": true
    • "threadpool.bulk.size": 20
    • "threadpool.bulk.queue_size": 500
    • "indices.memory.index_buffer_size": "50%"
    • "index.refresh_interval": "30s"
    • "index.merge.policy.segments_per_tier": 100

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/CAHhx-GLy1mftUtFT6eyrnuzcpNTu%
3DDt3maj3YnuEdYKP4NaYWA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GLy1mftUtFT6eyrnuzcpNTu%3DDt3maj3YnuEdYKP4NaYWA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/615b3774-b3b1-4104-bd22-0a7e4d8b6d4e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/615b3774-b3b1-4104-bd22-0a7e4d8b6d4e%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRe7xv4yxVr1G9WNJbGXW0g1jRkKut7GSU0qYAUUPuSSDQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Srinath C) #8

Thanks Joe, Michael and all. Really appreciate you help.
I'll try out as per your suggestions and run the tests. Will post back on
my progress.

On Tue, Jul 15, 2014 at 3:17 PM, Michael McCandless mike@elasticsearch.com
wrote:

First off, upgrade ES to the latest (1.2.2) release; there have been a
number of bulk indexing improvements since 1.1.

Second, disable merge IO throttling.

Third, use the default settings, but increase index.refresh_interval to
perhaps 5s, and set index.translog.flush_threshold_ops to maybe 50000: this
decreases the frequency of Lucene level commits (= filesystem fsyncs).

If possible, use SSDs: they are much faster for merging.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jul 14, 2014 at 11:03 PM, Srinath C srinath.c@gmail.com wrote:

Each document is around 300 bytes on average so that bring up the data
rate to around 17Mb per sec.
This is running on ES version 1.1.1. I have been trying out different
values for these configurations. queue_size was increased when I got
EsRejectedException due to queue going full (default size of 50).
segments_per_tier was picked up from some articles on scaling. What would
be a reasonable value based on my data rate?

If 60K seems to be too high are there any benchmarks available for
ElasticSearch?

Thanks all for your replies.

On Monday, 14 July 2014 15:25:13 UTC+5:30, Jörg Prante wrote:

index.merge.policy.segments_per_tier: 100 and threadpool.bulk.queue_size:
500 are extreme settings that should be avoided as they allocate much
resources. What you see by UnavailbleShardException / NoNodes is
congestion because of such extreme values.

What ES version is this? Why don't you use the default settings?

Jörg

On Mon, Jul 14, 2014 at 4:46 AM, Srinath C srin...@gmail.com wrote:

Hi,
I'm having a tough time to keep ElasticSearch running healthily for
even 20-30 mins in my setup. At an indexing rate of 28-36K per second, the
CPU utilization soon drops to 100% and never recovers. All client requests
fail with UnavailbleShardException or "No Nodes" exception. The logs show
warnings from "monitor.jvm" saying that GC did not free up much of memory.

The ultimate requirement is to import data into the ES cluster at
around 60K per second on a setup explained below. The only operation being
performed is bulk import of documents. Soon the ES nodes become
unresponsive and the CPU utilization drops to 100% (from 400-500%). They
don't seem to recover even after the bulk import operations are ceased.

Any suggestions on how to tune the GC based on my requirements? What
other information would be needed to look into this?

Thanks,
Srinath.

The setup:

  • Cluster: a 4 node cluster of c3.2xlarge instances on aws-ec2.
  • Load: The only operation during this test is bulk import of data.
    The documents are small around the size of ~200-500 bytes and are being
    bulk imported into the cluster using storm.
  • Bulk Import: A total of 7-9 storm workers using a single
    BulkProcessor each to import data into the ES cluster. As seen from the
    logs, each of the worker processes are importing around 4K docs per second
    from each worker i.e. around 28-36K docs per second getting imported into
    ES.
  • JVM Args: Around 8G of heap, tried with CMS collector as well as G1
    collector
  • ES configuration:
    • "mlockall": true
    • "threadpool.bulk.size": 20
    • "threadpool.bulk.queue_size": 500
    • "indices.memory.index_buffer_size": "50%"
    • "index.refresh_interval": "30s"
    • "index.merge.policy.segments_per_tier": 100

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/CAHhx-GLy1mftUtFT6eyrnuzcpNTu%
3DDt3maj3YnuEdYKP4NaYWA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GLy1mftUtFT6eyrnuzcpNTu%3DDt3maj3YnuEdYKP4NaYWA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/615b3774-b3b1-4104-bd22-0a7e4d8b6d4e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/615b3774-b3b1-4104-bd22-0a7e4d8b6d4e%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/rlIuagf2_zY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAD7smRe7xv4yxVr1G9WNJbGXW0g1jRkKut7GSU0qYAUUPuSSDQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAD7smRe7xv4yxVr1G9WNJbGXW0g1jRkKut7GSU0qYAUUPuSSDQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHhx-GKNbRyC5XvjUe3fhYikfha5UrefT%2BDNq-P%3DiGmWYH%2BpSg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Srinath C) #9

Thanks Joe, Michael and all. Really appreciate you help.
I'll try out as per your suggestions and run the tests. Will post back on
my progress.

On Tue, Jul 15, 2014 at 3:17 PM, Michael McCandless mike@elasticsearch.com
wrote:

First off, upgrade ES to the latest (1.2.2) release; there have been a
number of bulk indexing improvements since 1.1.

Second, disable merge IO throttling.

Third, use the default settings, but increase index.refresh_interval to
perhaps 5s, and set index.translog.flush_threshold_ops to maybe 50000: this
decreases the frequency of Lucene level commits (= filesystem fsyncs).

If possible, use SSDs: they are much faster for merging.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jul 14, 2014 at 11:03 PM, Srinath C srinath.c@gmail.com wrote:

Each document is around 300 bytes on average so that bring up the data
rate to around 17Mb per sec.
This is running on ES version 1.1.1. I have been trying out different
values for these configurations. queue_size was increased when I got
EsRejectedException due to queue going full (default size of 50).
segments_per_tier was picked up from some articles on scaling. What would
be a reasonable value based on my data rate?

If 60K seems to be too high are there any benchmarks available for
ElasticSearch?

Thanks all for your replies.

On Monday, 14 July 2014 15:25:13 UTC+5:30, Jörg Prante wrote:

index.merge.policy.segments_per_tier: 100 and threadpool.bulk.queue_size:
500 are extreme settings that should be avoided as they allocate much
resources. What you see by UnavailbleShardException / NoNodes is
congestion because of such extreme values.

What ES version is this? Why don't you use the default settings?

Jörg

On Mon, Jul 14, 2014 at 4:46 AM, Srinath C srin...@gmail.com wrote:

Hi,
I'm having a tough time to keep ElasticSearch running healthily for
even 20-30 mins in my setup. At an indexing rate of 28-36K per second, the
CPU utilization soon drops to 100% and never recovers. All client requests
fail with UnavailbleShardException or "No Nodes" exception. The logs show
warnings from "monitor.jvm" saying that GC did not free up much of memory.

The ultimate requirement is to import data into the ES cluster at
around 60K per second on a setup explained below. The only operation being
performed is bulk import of documents. Soon the ES nodes become
unresponsive and the CPU utilization drops to 100% (from 400-500%). They
don't seem to recover even after the bulk import operations are ceased.

Any suggestions on how to tune the GC based on my requirements? What
other information would be needed to look into this?

Thanks,
Srinath.

The setup:

  • Cluster: a 4 node cluster of c3.2xlarge instances on aws-ec2.
  • Load: The only operation during this test is bulk import of data.
    The documents are small around the size of ~200-500 bytes and are being
    bulk imported into the cluster using storm.
  • Bulk Import: A total of 7-9 storm workers using a single
    BulkProcessor each to import data into the ES cluster. As seen from the
    logs, each of the worker processes are importing around 4K docs per second
    from each worker i.e. around 28-36K docs per second getting imported into
    ES.
  • JVM Args: Around 8G of heap, tried with CMS collector as well as G1
    collector
  • ES configuration:
    • "mlockall": true
    • "threadpool.bulk.size": 20
    • "threadpool.bulk.queue_size": 500
    • "indices.memory.index_buffer_size": "50%"
    • "index.refresh_interval": "30s"
    • "index.merge.policy.segments_per_tier": 100

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/CAHhx-GLy1mftUtFT6eyrnuzcpNTu%
3DDt3maj3YnuEdYKP4NaYWA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GLy1mftUtFT6eyrnuzcpNTu%3DDt3maj3YnuEdYKP4NaYWA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/615b3774-b3b1-4104-bd22-0a7e4d8b6d4e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/615b3774-b3b1-4104-bd22-0a7e4d8b6d4e%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/rlIuagf2_zY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAD7smRe7xv4yxVr1G9WNJbGXW0g1jRkKut7GSU0qYAUUPuSSDQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAD7smRe7xv4yxVr1G9WNJbGXW0g1jRkKut7GSU0qYAUUPuSSDQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHhx-GKs03bq-AONginAdVrDM8d%2B1DiRwNYGRYUp_HQY8gJ3-w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Srinath C) #10

Hi Joe/Michael,
I tried all your suggestions and found a remarkable difference in the
way elasticsearch is able to handle the bulk indexing.
Right now, I'm able to ingest at the rate of 25K per second with the
same setup. But occasionally there are still some
EsRejectedExecutionException being raised. The CPUUtilization on the
elasticsearch nodes is so low (around 200% on an 8 core system) that it
seems that something else is wrong. I have also tried to increase
queue_size but it just delays the EsRejectedExecutionException.

Any more suggestions on how to handle this?

Current setup: 4 c3.2xlarge instances of ES 1.2.2.
Current Configurations:
index.codec.bloom.load: false
index.compound_format: false
index.compound_on_flush: false
index.merge.policy.max_merge_at_once: 4
index.merge.policy.max_merge_at_once_explicit: 4
index.merge.policy.max_merged_segment: 1gb
index.merge.policy.segments_per_tier: 4
index.merge.policy.type: tiered
index.merge.scheduler.max_thread_count: 4
index.merge.scheduler.type: concurrent
index.refresh_interval: 10s
index.translog.flush_threshold_ops: 50000
index.translog.interval: 10s
index.warmer.enabled: false
indices.memory.index_buffer_size: 50%
indices.store.throttle.type: none

On Tue, Jul 15, 2014 at 6:24 PM, Srinath C srinath.c@gmail.com wrote:

Thanks Joe, Michael and all. Really appreciate you help.
I'll try out as per your suggestions and run the tests. Will post back on
my progress.

On Tue, Jul 15, 2014 at 3:17 PM, Michael McCandless <
mike@elasticsearch.com> wrote:

First off, upgrade ES to the latest (1.2.2) release; there have been a
number of bulk indexing improvements since 1.1.

Second, disable merge IO throttling.

Third, use the default settings, but increase index.refresh_interval to
perhaps 5s, and set index.translog.flush_threshold_ops to maybe 50000: this
decreases the frequency of Lucene level commits (= filesystem fsyncs).

If possible, use SSDs: they are much faster for merging.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jul 14, 2014 at 11:03 PM, Srinath C srinath.c@gmail.com wrote:

Each document is around 300 bytes on average so that bring up the data
rate to around 17Mb per sec.
This is running on ES version 1.1.1. I have been trying out different
values for these configurations. queue_size was increased when I got
EsRejectedException due to queue going full (default size of 50).
segments_per_tier was picked up from some articles on scaling. What would
be a reasonable value based on my data rate?

If 60K seems to be too high are there any benchmarks available for
ElasticSearch?

Thanks all for your replies.

On Monday, 14 July 2014 15:25:13 UTC+5:30, Jörg Prante wrote:

index.merge.policy.segments_per_tier: 100 and threadpool.bulk.queue_
size: 500 are extreme settings that should be avoided as they allocate
much resources. What you see by UnavailbleShardException / NoNodes is
congestion because of such extreme values.

What ES version is this? Why don't you use the default settings?

Jörg

On Mon, Jul 14, 2014 at 4:46 AM, Srinath C srin...@gmail.com wrote:

Hi,
I'm having a tough time to keep ElasticSearch running healthily for
even 20-30 mins in my setup. At an indexing rate of 28-36K per second, the
CPU utilization soon drops to 100% and never recovers. All client requests
fail with UnavailbleShardException or "No Nodes" exception. The logs show
warnings from "monitor.jvm" saying that GC did not free up much of memory.

The ultimate requirement is to import data into the ES cluster at
around 60K per second on a setup explained below. The only operation being
performed is bulk import of documents. Soon the ES nodes become
unresponsive and the CPU utilization drops to 100% (from 400-500%). They
don't seem to recover even after the bulk import operations are ceased.

Any suggestions on how to tune the GC based on my requirements? What
other information would be needed to look into this?

Thanks,
Srinath.

The setup:

  • Cluster: a 4 node cluster of c3.2xlarge instances on aws-ec2.
  • Load: The only operation during this test is bulk import of data.
    The documents are small around the size of ~200-500 bytes and are being
    bulk imported into the cluster using storm.
  • Bulk Import: A total of 7-9 storm workers using a single
    BulkProcessor each to import data into the ES cluster. As seen from the
    logs, each of the worker processes are importing around 4K docs per second
    from each worker i.e. around 28-36K docs per second getting imported into
    ES.
  • JVM Args: Around 8G of heap, tried with CMS collector as well as
    G1 collector
  • ES configuration:
    • "mlockall": true
    • "threadpool.bulk.size": 20
    • "threadpool.bulk.queue_size": 500
    • "indices.memory.index_buffer_size": "50%"
    • "index.refresh_interval": "30s"
    • "index.merge.policy.segments_per_tier": 100

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/CAHhx-GLy1mftUtFT6eyrnuzcpNTu%
3DDt3maj3YnuEdYKP4NaYWA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GLy1mftUtFT6eyrnuzcpNTu%3DDt3maj3YnuEdYKP4NaYWA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/615b3774-b3b1-4104-bd22-0a7e4d8b6d4e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/615b3774-b3b1-4104-bd22-0a7e4d8b6d4e%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/rlIuagf2_zY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAD7smRe7xv4yxVr1G9WNJbGXW0g1jRkKut7GSU0qYAUUPuSSDQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAD7smRe7xv4yxVr1G9WNJbGXW0g1jRkKut7GSU0qYAUUPuSSDQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHhx-GK2LxUGTNOpnwH-PfnFx1Vwz8tGw7V-r50LZ5%3DUp5MPgQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Michael McCandless) #11

Where is the index stored in your EC2 instances? It's it just an EBS
attached storage (magnetic or SSDs? provisioned IOPs or the default).

Maybe try putting the index on the SSD instance storage instead? I realize
this is not a long term solution (limited storage, and it's cleared on
reboot), but it would be a simple test to see if the IO limitations of EBS
is the bottleneck here.

Can you capture the hot threads output when you're at 200% CPU after
indexing for a while?

Mike McCandless

http://blog.mikemccandless.com

On Wed, Jul 16, 2014 at 3:03 AM, Srinath C srinath.c@gmail.com wrote:

Hi Joe/Michael,
I tried all your suggestions and found a remarkable difference in the
way elasticsearch is able to handle the bulk indexing.
Right now, I'm able to ingest at the rate of 25K per second with the
same setup. But occasionally there are still some
EsRejectedExecutionException being raised. The CPUUtilization on the
elasticsearch nodes is so low (around 200% on an 8 core system) that it
seems that something else is wrong. I have also tried to increase
queue_size but it just delays the EsRejectedExecutionException.

Any more suggestions on how to handle this?

Current setup: 4 c3.2xlarge instances of ES 1.2.2.
Current Configurations:
index.codec.bloom.load: false
index.compound_format: false
index.compound_on_flush: false
index.merge.policy.max_merge_at_once: 4
index.merge.policy.max_merge_at_once_explicit: 4
index.merge.policy.max_merged_segment: 1gb
index.merge.policy.segments_per_tier: 4
index.merge.policy.type: tiered
index.merge.scheduler.max_thread_count: 4
index.merge.scheduler.type: concurrent
index.refresh_interval: 10s
index.translog.flush_threshold_ops: 50000
index.translog.interval: 10s
index.warmer.enabled: false
indices.memory.index_buffer_size: 50%
indices.store.throttle.type: none

On Tue, Jul 15, 2014 at 6:24 PM, Srinath C srinath.c@gmail.com wrote:

Thanks Joe, Michael and all. Really appreciate you help.
I'll try out as per your suggestions and run the tests. Will post back on
my progress.

On Tue, Jul 15, 2014 at 3:17 PM, Michael McCandless <
mike@elasticsearch.com> wrote:

First off, upgrade ES to the latest (1.2.2) release; there have been a
number of bulk indexing improvements since 1.1.

Second, disable merge IO throttling.

Third, use the default settings, but increase index.refresh_interval to
perhaps 5s, and set index.translog.flush_threshold_ops to maybe 50000: this
decreases the frequency of Lucene level commits (= filesystem fsyncs).

If possible, use SSDs: they are much faster for merging.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jul 14, 2014 at 11:03 PM, Srinath C srinath.c@gmail.com wrote:

Each document is around 300 bytes on average so that bring up the data
rate to around 17Mb per sec.
This is running on ES version 1.1.1. I have been trying out different
values for these configurations. queue_size was increased when I got
EsRejectedException due to queue going full (default size of 50).
segments_per_tier was picked up from some articles on scaling. What would
be a reasonable value based on my data rate?

If 60K seems to be too high are there any benchmarks available for
ElasticSearch?

Thanks all for your replies.

On Monday, 14 July 2014 15:25:13 UTC+5:30, Jörg Prante wrote:

index.merge.policy.segments_per_tier: 100 and threadpool.bulk.queue_
size: 500 are extreme settings that should be avoided as they
allocate much resources. What you see by UnavailbleShardException /
NoNodes is congestion because of such extreme values.

What ES version is this? Why don't you use the default settings?

Jörg

On Mon, Jul 14, 2014 at 4:46 AM, Srinath C srin...@gmail.com wrote:

Hi,
I'm having a tough time to keep ElasticSearch running healthily
for even 20-30 mins in my setup. At an indexing rate of 28-36K per second,
the CPU utilization soon drops to 100% and never recovers. All client
requests fail with UnavailbleShardException or "No Nodes" exception. The
logs show warnings from "monitor.jvm" saying that GC did not free up much
of memory.

The ultimate requirement is to import data into the ES cluster at
around 60K per second on a setup explained below. The only operation being
performed is bulk import of documents. Soon the ES nodes become
unresponsive and the CPU utilization drops to 100% (from 400-500%). They
don't seem to recover even after the bulk import operations are ceased.

Any suggestions on how to tune the GC based on my requirements?
What other information would be needed to look into this?

Thanks,
Srinath.

The setup:

  • Cluster: a 4 node cluster of c3.2xlarge instances on aws-ec2.
  • Load: The only operation during this test is bulk import of data.
    The documents are small around the size of ~200-500 bytes and are being
    bulk imported into the cluster using storm.
  • Bulk Import: A total of 7-9 storm workers using a single
    BulkProcessor each to import data into the ES cluster. As seen from the
    logs, each of the worker processes are importing around 4K docs per second
    from each worker i.e. around 28-36K docs per second getting imported into
    ES.
  • JVM Args: Around 8G of heap, tried with CMS collector as well as
    G1 collector
  • ES configuration:
    • "mlockall": true
    • "threadpool.bulk.size": 20
    • "threadpool.bulk.queue_size": 500
    • "indices.memory.index_buffer_size": "50%"
    • "index.refresh_interval": "30s"
    • "index.merge.policy.segments_per_tier": 100

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/CAHhx-GLy1mftUtFT6eyrnuzcpNTu%
3DDt3maj3YnuEdYKP4NaYWA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GLy1mftUtFT6eyrnuzcpNTu%3DDt3maj3YnuEdYKP4NaYWA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/615b3774-b3b1-4104-bd22-0a7e4d8b6d4e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/615b3774-b3b1-4104-bd22-0a7e4d8b6d4e%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/rlIuagf2_zY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAD7smRe7xv4yxVr1G9WNJbGXW0g1jRkKut7GSU0qYAUUPuSSDQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAD7smRe7xv4yxVr1G9WNJbGXW0g1jRkKut7GSU0qYAUUPuSSDQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GK2LxUGTNOpnwH-PfnFx1Vwz8tGw7V-r50LZ5%3DUp5MPgQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GK2LxUGTNOpnwH-PfnFx1Vwz8tGw7V-r50LZ5%3DUp5MPgQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRdFrgsyFnUQqgSgvvr4MnJDtDz-sWci3Z3qmTt_FrKJ2g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #12

Adding to this recommendations, I would suggest running iostat tool to
monitor for any suspicious "%iowait" states while
ESRejectedExecutionExceptions do arise.

Jörg

On Wed, Jul 16, 2014 at 11:53 AM, Michael McCandless <mike@elasticsearch.com

wrote:

Where is the index stored in your EC2 instances? It's it just an EBS
attached storage (magnetic or SSDs? provisioned IOPs or the default).

Maybe try putting the index on the SSD instance storage instead? I
realize this is not a long term solution (limited storage, and it's cleared
on reboot), but it would be a simple test to see if the IO limitations of
EBS is the bottleneck here.

Can you capture the hot threads output when you're at 200% CPU after
indexing for a while?

Mike McCandless

http://blog.mikemccandless.com

On Wed, Jul 16, 2014 at 3:03 AM, Srinath C srinath.c@gmail.com wrote:

Hi Joe/Michael,
I tried all your suggestions and found a remarkable difference in the
way elasticsearch is able to handle the bulk indexing.
Right now, I'm able to ingest at the rate of 25K per second with the
same setup. But occasionally there are still some
EsRejectedExecutionException being raised. The CPUUtilization on the
elasticsearch nodes is so low (around 200% on an 8 core system) that it
seems that something else is wrong. I have also tried to increase
queue_size but it just delays the EsRejectedExecutionException.

Any more suggestions on how to handle this?

Current setup: 4 c3.2xlarge instances of ES 1.2.2.
Current Configurations:
index.codec.bloom.load: false
index.compound_format: false
index.compound_on_flush: false
index.merge.policy.max_merge_at_once: 4
index.merge.policy.max_merge_at_once_explicit: 4
index.merge.policy.max_merged_segment: 1gb
index.merge.policy.segments_per_tier: 4
index.merge.policy.type: tiered
index.merge.scheduler.max_thread_count: 4
index.merge.scheduler.type: concurrent
index.refresh_interval: 10s
index.translog.flush_threshold_ops: 50000
index.translog.interval: 10s
index.warmer.enabled: false
indices.memory.index_buffer_size: 50%
indices.store.throttle.type: none

On Tue, Jul 15, 2014 at 6:24 PM, Srinath C srinath.c@gmail.com wrote:

Thanks Joe, Michael and all. Really appreciate you help.
I'll try out as per your suggestions and run the tests. Will post back
on my progress.

On Tue, Jul 15, 2014 at 3:17 PM, Michael McCandless <
mike@elasticsearch.com> wrote:

First off, upgrade ES to the latest (1.2.2) release; there have been a
number of bulk indexing improvements since 1.1.

Second, disable merge IO throttling.

Third, use the default settings, but increase index.refresh_interval to
perhaps 5s, and set index.translog.flush_threshold_ops to maybe 50000: this
decreases the frequency of Lucene level commits (= filesystem fsyncs).

If possible, use SSDs: they are much faster for merging.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jul 14, 2014 at 11:03 PM, Srinath C srinath.c@gmail.com
wrote:

Each document is around 300 bytes on average so that bring up the data
rate to around 17Mb per sec.
This is running on ES version 1.1.1. I have been trying out different
values for these configurations. queue_size was increased when I got
EsRejectedException due to queue going full (default size of 50).
segments_per_tier was picked up from some articles on scaling. What would
be a reasonable value based on my data rate?

If 60K seems to be too high are there any benchmarks available for
ElasticSearch?

Thanks all for your replies.

On Monday, 14 July 2014 15:25:13 UTC+5:30, Jörg Prante wrote:

index.merge.policy.segments_per_tier: 100 and threadpool.bulk.queue_
size: 500 are extreme settings that should be avoided as they
allocate much resources. What you see by UnavailbleShardException /
NoNodes is congestion because of such extreme values.

What ES version is this? Why don't you use the default settings?

Jörg

On Mon, Jul 14, 2014 at 4:46 AM, Srinath C srin...@gmail.com wrote:

Hi,
I'm having a tough time to keep ElasticSearch running healthily
for even 20-30 mins in my setup. At an indexing rate of 28-36K per second,
the CPU utilization soon drops to 100% and never recovers. All client
requests fail with UnavailbleShardException or "No Nodes" exception. The
logs show warnings from "monitor.jvm" saying that GC did not free up much
of memory.

The ultimate requirement is to import data into the ES cluster at
around 60K per second on a setup explained below. The only operation being
performed is bulk import of documents. Soon the ES nodes become
unresponsive and the CPU utilization drops to 100% (from 400-500%). They
don't seem to recover even after the bulk import operations are ceased.

Any suggestions on how to tune the GC based on my requirements?
What other information would be needed to look into this?

Thanks,
Srinath.

The setup:

  • Cluster: a 4 node cluster of c3.2xlarge instances on aws-ec2.
  • Load: The only operation during this test is bulk import of
    data. The documents are small around the size of ~200-500 bytes and are
    being bulk imported into the cluster using storm.
  • Bulk Import: A total of 7-9 storm workers using a single
    BulkProcessor each to import data into the ES cluster. As seen from the
    logs, each of the worker processes are importing around 4K docs per second
    from each worker i.e. around 28-36K docs per second getting imported into
    ES.
  • JVM Args: Around 8G of heap, tried with CMS collector as well as
    G1 collector
  • ES configuration:
    • "mlockall": true
    • "threadpool.bulk.size": 20
    • "threadpool.bulk.queue_size": 500
    • "indices.memory.index_buffer_size": "50%"
    • "index.refresh_interval": "30s"
    • "index.merge.policy.segments_per_tier": 100

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHhx-
GLy1mftUtFT6eyrnuzcpNTu%3DDt3maj3YnuEdYKP4NaYWA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GLy1mftUtFT6eyrnuzcpNTu%3DDt3maj3YnuEdYKP4NaYWA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/615b3774-b3b1-4104-bd22-0a7e4d8b6d4e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/615b3774-b3b1-4104-bd22-0a7e4d8b6d4e%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/rlIuagf2_zY/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAD7smRe7xv4yxVr1G9WNJbGXW0g1jRkKut7GSU0qYAUUPuSSDQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAD7smRe7xv4yxVr1G9WNJbGXW0g1jRkKut7GSU0qYAUUPuSSDQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GK2LxUGTNOpnwH-PfnFx1Vwz8tGw7V-r50LZ5%3DUp5MPgQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GK2LxUGTNOpnwH-PfnFx1Vwz8tGw7V-r50LZ5%3DUp5MPgQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAD7smRdFrgsyFnUQqgSgvvr4MnJDtDz-sWci3Z3qmTt_FrKJ2g%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAD7smRdFrgsyFnUQqgSgvvr4MnJDtDz-sWci3Z3qmTt_FrKJ2g%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFKb5fV86CSgWJyDNgKepMi40KOHyxPqyxk38FPOEmP8g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Srinath C) #13

Hi Michael,
You were right. Its the IO that was the bottleneck. The data was being
written into a standard EBS device - no provisioned IOPS.

After redirecting data into the local instance store SSD storage, I was
able to get to a rate of around 50-55K without any EsRejectExceptions. The
CPU utilization too is not too high - around 200 - 400%. I have attached
the hot_threads output with this email. After running for around 1.5 hrs I
could see a lot of EsRejectedExecutionException for certain periods of time.

std_ebs_all_fine.txt - when using standard EBS. Around 25K docs per second.
No EsRejectedExecutionExceptions.
std_ebs_bulk_rejects.txt - when using standard EBS. Around 28K docs per
second. No EsRejectedExecutionExceptions.

instance_ssd_40K.txt - when using instance store SSD. Around 40K docs per
second. No EsRejectedExecutionExceptions.
instance_ssd_60K_few_rejects.txt - when using instance store SSD. Around
60K docs per second. Some EsRejectedExecutionExceptions were seen.
instance_ssd_60K_lot_of_rejects.txt - when using instance store SSD. Around
60K docs per second. A lot of EsRejectedExecutionExceptions were seen.

Also attaching the iostat output for these instances.

Regards,
Srinath.

On Wed, Jul 16, 2014 at 3:34 PM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

Adding to this recommendations, I would suggest running iostat tool to
monitor for any suspicious "%iowait" states while
ESRejectedExecutionExceptions do arise.

Jörg

On Wed, Jul 16, 2014 at 11:53 AM, Michael McCandless <
mike@elasticsearch.com> wrote:

Where is the index stored in your EC2 instances? It's it just an EBS
attached storage (magnetic or SSDs? provisioned IOPs or the default).

Maybe try putting the index on the SSD instance storage instead? I
realize this is not a long term solution (limited storage, and it's cleared
on reboot), but it would be a simple test to see if the IO limitations of
EBS is the bottleneck here.

Can you capture the hot threads output when you're at 200% CPU after
indexing for a while?

Mike McCandless

http://blog.mikemccandless.com

On Wed, Jul 16, 2014 at 3:03 AM, Srinath C srinath.c@gmail.com wrote:

Hi Joe/Michael,
I tried all your suggestions and found a remarkable difference in the
way elasticsearch is able to handle the bulk indexing.
Right now, I'm able to ingest at the rate of 25K per second with the
same setup. But occasionally there are still some
EsRejectedExecutionException being raised. The CPUUtilization on the
elasticsearch nodes is so low (around 200% on an 8 core system) that it
seems that something else is wrong. I have also tried to increase
queue_size but it just delays the EsRejectedExecutionException.

Any more suggestions on how to handle this?

Current setup: 4 c3.2xlarge instances of ES 1.2.2.
Current Configurations:
index.codec.bloom.load: false
index.compound_format: false
index.compound_on_flush: false
index.merge.policy.max_merge_at_once: 4
index.merge.policy.max_merge_at_once_explicit: 4
index.merge.policy.max_merged_segment: 1gb
index.merge.policy.segments_per_tier: 4
index.merge.policy.type: tiered
index.merge.scheduler.max_thread_count: 4
index.merge.scheduler.type: concurrent
index.refresh_interval: 10s
index.translog.flush_threshold_ops: 50000
index.translog.interval: 10s
index.warmer.enabled: false
indices.memory.index_buffer_size: 50%
indices.store.throttle.type: none

On Tue, Jul 15, 2014 at 6:24 PM, Srinath C srinath.c@gmail.com wrote:

Thanks Joe, Michael and all. Really appreciate you help.
I'll try out as per your suggestions and run the tests. Will post back
on my progress.

On Tue, Jul 15, 2014 at 3:17 PM, Michael McCandless <
mike@elasticsearch.com> wrote:

First off, upgrade ES to the latest (1.2.2) release; there have been a
number of bulk indexing improvements since 1.1.

Second, disable merge IO throttling.

Third, use the default settings, but increase index.refresh_interval
to perhaps 5s, and set index.translog.flush_threshold_ops to maybe 50000:
this decreases the frequency of Lucene level commits (= filesystem fsyncs).

If possible, use SSDs: they are much faster for merging.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jul 14, 2014 at 11:03 PM, Srinath C srinath.c@gmail.com
wrote:

Each document is around 300 bytes on average so that bring up the
data rate to around 17Mb per sec.
This is running on ES version 1.1.1. I have been trying out different
values for these configurations. queue_size was increased when I got
EsRejectedException due to queue going full (default size of 50).
segments_per_tier was picked up from some articles on scaling. What would
be a reasonable value based on my data rate?

If 60K seems to be too high are there any benchmarks available for
ElasticSearch?

Thanks all for your replies.

On Monday, 14 July 2014 15:25:13 UTC+5:30, Jörg Prante wrote:

index.merge.policy.segments_per_tier: 100 and threadpool.bulk.queue_
size: 500 are extreme settings that should be avoided as they
allocate much resources. What you see by UnavailbleShardException /
NoNodes is congestion because of such extreme values.

What ES version is this? Why don't you use the default settings?

Jörg

On Mon, Jul 14, 2014 at 4:46 AM, Srinath C srin...@gmail.com
wrote:

Hi,
I'm having a tough time to keep ElasticSearch running healthily
for even 20-30 mins in my setup. At an indexing rate of 28-36K per second,
the CPU utilization soon drops to 100% and never recovers. All client
requests fail with UnavailbleShardException or "No Nodes" exception. The
logs show warnings from "monitor.jvm" saying that GC did not free up much
of memory.

The ultimate requirement is to import data into the ES cluster at
around 60K per second on a setup explained below. The only operation being
performed is bulk import of documents. Soon the ES nodes become
unresponsive and the CPU utilization drops to 100% (from 400-500%). They
don't seem to recover even after the bulk import operations are ceased.

Any suggestions on how to tune the GC based on my requirements?
What other information would be needed to look into this?

Thanks,
Srinath.

The setup:

  • Cluster: a 4 node cluster of c3.2xlarge instances on aws-ec2.
  • Load: The only operation during this test is bulk import of
    data. The documents are small around the size of ~200-500 bytes and are
    being bulk imported into the cluster using storm.
  • Bulk Import: A total of 7-9 storm workers using a single
    BulkProcessor each to import data into the ES cluster. As seen from the
    logs, each of the worker processes are importing around 4K docs per second
    from each worker i.e. around 28-36K docs per second getting imported into
    ES.
  • JVM Args: Around 8G of heap, tried with CMS collector as well
    as G1 collector
  • ES configuration:
    • "mlockall": true
    • "threadpool.bulk.size": 20
    • "threadpool.bulk.queue_size": 500
    • "indices.memory.index_buffer_size": "50%"
    • "index.refresh_interval": "30s"
    • "index.merge.policy.segments_per_tier": 100

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHhx-
GLy1mftUtFT6eyrnuzcpNTu%3DDt3maj3YnuEdYKP4NaYWA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GLy1mftUtFT6eyrnuzcpNTu%3DDt3maj3YnuEdYKP4NaYWA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/615b3774-b3b1-4104-bd22-0a7e4d8b6d4e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/615b3774-b3b1-4104-bd22-0a7e4d8b6d4e%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/rlIuagf2_zY/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAD7smRe7xv4yxVr1G9WNJbGXW0g1jRkKut7GSU0qYAUUPuSSDQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAD7smRe7xv4yxVr1G9WNJbGXW0g1jRkKut7GSU0qYAUUPuSSDQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GK2LxUGTNOpnwH-PfnFx1Vwz8tGw7V-r50LZ5%3DUp5MPgQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GK2LxUGTNOpnwH-PfnFx1Vwz8tGw7V-r50LZ5%3DUp5MPgQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAD7smRdFrgsyFnUQqgSgvvr4MnJDtDz-sWci3Z3qmTt_FrKJ2g%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAD7smRdFrgsyFnUQqgSgvvr4MnJDtDz-sWci3Z3qmTt_FrKJ2g%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/rlIuagf2_zY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFKb5fV86CSgWJyDNgKepMi40KOHyxPqyxk38FPOEmP8g%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFKb5fV86CSgWJyDNgKepMi40KOHyxPqyxk38FPOEmP8g%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHhx-GJ46mOLU4X50X5yzO7V2JSottRaG48fqM%2B3Y9Dqj2SUcA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Srinath C) #14

Hi Michael,
Did you get a chance to look at the hot_threads and iostat output?
I also tried with EBS Provisioned SSB with 4000 IOPS and with that I was
able to ingest only at around 30K per second after which there are
EsRejectedExecutionException. There were 4 elasticsearch instances of type
c3.2xlarge. CPU utilization was around 650% (out of 800). The iostat output
on the instances looks like this:

avg-cpu: %user %nice %system %iowait %steal %idle
1.66 0.00 0.14 0.15 0.04 98.01

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
xvdep1 7.86 36.95 266.05 392378 2825424
xvdf 0.03 0.20 0.00 2146 8
xvdg 0.03 0.21 0.07 2178 736
xvdj 52.53 0.33 2693.62 3506 28605624

On an instance store SSD I can go upto 48K per second with occasional
occurrences of EsRejectedExecutionException. Do you think I should try
storage optimized instances like i2.xlarge or i2.2xlarge to handle this
kind of load?

Regards,
Srinath.

On Wed, Jul 16, 2014 at 5:57 PM, Srinath C srinath.c@gmail.com wrote:

Hi Michael,
You were right. Its the IO that was the bottleneck. The data was being
written into a standard EBS device - no provisioned IOPS.

After redirecting data into the local instance store SSD storage, I was
able to get to a rate of around 50-55K without any EsRejectExceptions. The
CPU utilization too is not too high - around 200 - 400%. I have attached
the hot_threads output with this email. After running for around 1.5 hrs I
could see a lot of EsRejectedExecutionException for certain periods of time.

std_ebs_all_fine.txt - when using standard EBS. Around 25K docs per
second. No EsRejectedExecutionExceptions.
std_ebs_bulk_rejects.txt - when using standard EBS. Around 28K docs per
second. No EsRejectedExecutionExceptions.

instance_ssd_40K.txt - when using instance store SSD. Around 40K docs per
second. No EsRejectedExecutionExceptions.
instance_ssd_60K_few_rejects.txt - when using instance store SSD. Around
60K docs per second. Some EsRejectedExecutionExceptions were seen.
instance_ssd_60K_lot_of_rejects.txt - when using instance store SSD.
Around 60K docs per second. A lot of EsRejectedExecutionExceptions were
seen.

Also attaching the iostat output for these instances.

Regards,
Srinath.

On Wed, Jul 16, 2014 at 3:34 PM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

Adding to this recommendations, I would suggest running iostat tool to
monitor for any suspicious "%iowait" states while
ESRejectedExecutionExceptions do arise.

Jörg

On Wed, Jul 16, 2014 at 11:53 AM, Michael McCandless <
mike@elasticsearch.com> wrote:

Where is the index stored in your EC2 instances? It's it just an EBS
attached storage (magnetic or SSDs? provisioned IOPs or the default).

Maybe try putting the index on the SSD instance storage instead? I
realize this is not a long term solution (limited storage, and it's cleared
on reboot), but it would be a simple test to see if the IO limitations of
EBS is the bottleneck here.

Can you capture the hot threads output when you're at 200% CPU after
indexing for a while?

Mike McCandless

http://blog.mikemccandless.com

On Wed, Jul 16, 2014 at 3:03 AM, Srinath C srinath.c@gmail.com wrote:

Hi Joe/Michael,
I tried all your suggestions and found a remarkable difference in
the way elasticsearch is able to handle the bulk indexing.
Right now, I'm able to ingest at the rate of 25K per second with the
same setup. But occasionally there are still some
EsRejectedExecutionException being raised. The CPUUtilization on the
elasticsearch nodes is so low (around 200% on an 8 core system) that it
seems that something else is wrong. I have also tried to increase
queue_size but it just delays the EsRejectedExecutionException.

Any more suggestions on how to handle this?

Current setup: 4 c3.2xlarge instances of ES 1.2.2.
Current Configurations:
index.codec.bloom.load: false
index.compound_format: false
index.compound_on_flush: false
index.merge.policy.max_merge_at_once: 4
index.merge.policy.max_merge_at_once_explicit: 4
index.merge.policy.max_merged_segment: 1gb
index.merge.policy.segments_per_tier: 4
index.merge.policy.type: tiered
index.merge.scheduler.max_thread_count: 4
index.merge.scheduler.type: concurrent
index.refresh_interval: 10s
index.translog.flush_threshold_ops: 50000
index.translog.interval: 10s
index.warmer.enabled: false
indices.memory.index_buffer_size: 50%
indices.store.throttle.type: none

On Tue, Jul 15, 2014 at 6:24 PM, Srinath C srinath.c@gmail.com wrote:

Thanks Joe, Michael and all. Really appreciate you help.
I'll try out as per your suggestions and run the tests. Will post back
on my progress.

On Tue, Jul 15, 2014 at 3:17 PM, Michael McCandless <
mike@elasticsearch.com> wrote:

First off, upgrade ES to the latest (1.2.2) release; there have been
a number of bulk indexing improvements since 1.1.

Second, disable merge IO throttling.

Third, use the default settings, but increase index.refresh_interval
to perhaps 5s, and set index.translog.flush_threshold_ops to maybe 50000:
this decreases the frequency of Lucene level commits (= filesystem fsyncs).

If possible, use SSDs: they are much faster for merging.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jul 14, 2014 at 11:03 PM, Srinath C srinath.c@gmail.com
wrote:

Each document is around 300 bytes on average so that bring up the
data rate to around 17Mb per sec.
This is running on ES version 1.1.1. I have been trying out
different values for these configurations. queue_size was increased when I
got EsRejectedException due to queue going full (default size of 50).
segments_per_tier was picked up from some articles on scaling. What would
be a reasonable value based on my data rate?

If 60K seems to be too high are there any benchmarks available for
ElasticSearch?

Thanks all for your replies.

On Monday, 14 July 2014 15:25:13 UTC+5:30, Jörg Prante wrote:

index.merge.policy.segments_per_tier: 100 and
threadpool.bulk.queue_size: 500 are extreme settings that should
be avoided as they allocate much resources. What you see by UnavailbleShardException
/ NoNodes is congestion because of such extreme values.

What ES version is this? Why don't you use the default settings?

Jörg

On Mon, Jul 14, 2014 at 4:46 AM, Srinath C srin...@gmail.com
wrote:

Hi,
I'm having a tough time to keep ElasticSearch running healthily
for even 20-30 mins in my setup. At an indexing rate of 28-36K per second,
the CPU utilization soon drops to 100% and never recovers. All client
requests fail with UnavailbleShardException or "No Nodes" exception. The
logs show warnings from "monitor.jvm" saying that GC did not free up much
of memory.

The ultimate requirement is to import data into the ES cluster at
around 60K per second on a setup explained below. The only operation being
performed is bulk import of documents. Soon the ES nodes become
unresponsive and the CPU utilization drops to 100% (from 400-500%). They
don't seem to recover even after the bulk import operations are ceased.

Any suggestions on how to tune the GC based on my requirements?
What other information would be needed to look into this?

Thanks,
Srinath.

The setup:

  • Cluster: a 4 node cluster of c3.2xlarge instances on aws-ec2.
  • Load: The only operation during this test is bulk import of
    data. The documents are small around the size of ~200-500 bytes and are
    being bulk imported into the cluster using storm.
  • Bulk Import: A total of 7-9 storm workers using a single
    BulkProcessor each to import data into the ES cluster. As seen from the
    logs, each of the worker processes are importing around 4K docs per second
    from each worker i.e. around 28-36K docs per second getting imported into
    ES.
  • JVM Args: Around 8G of heap, tried with CMS collector as well
    as G1 collector
  • ES configuration:
    • "mlockall": true
    • "threadpool.bulk.size": 20
    • "threadpool.bulk.queue_size": 500
    • "indices.memory.index_buffer_size": "50%"
    • "index.refresh_interval": "30s"
    • "index.merge.policy.segments_per_tier": 100

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHhx-
GLy1mftUtFT6eyrnuzcpNTu%3DDt3maj3YnuEdYKP4NaYWA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GLy1mftUtFT6eyrnuzcpNTu%3DDt3maj3YnuEdYKP4NaYWA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/615b3774-b3b1-4104-bd22-0a7e4d8b6d4e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/615b3774-b3b1-4104-bd22-0a7e4d8b6d4e%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/rlIuagf2_zY/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAD7smRe7xv4yxVr1G9WNJbGXW0g1jRkKut7GSU0qYAUUPuSSDQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAD7smRe7xv4yxVr1G9WNJbGXW0g1jRkKut7GSU0qYAUUPuSSDQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GK2LxUGTNOpnwH-PfnFx1Vwz8tGw7V-r50LZ5%3DUp5MPgQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GK2LxUGTNOpnwH-PfnFx1Vwz8tGw7V-r50LZ5%3DUp5MPgQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAD7smRdFrgsyFnUQqgSgvvr4MnJDtDz-sWci3Z3qmTt_FrKJ2g%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAD7smRdFrgsyFnUQqgSgvvr4MnJDtDz-sWci3Z3qmTt_FrKJ2g%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/rlIuagf2_zY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFKb5fV86CSgWJyDNgKepMi40KOHyxPqyxk38FPOEmP8g%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFKb5fV86CSgWJyDNgKepMi40KOHyxPqyxk38FPOEmP8g%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHhx-GLL5MN_6E%2B4F%3Duq_4sNwYkgdBJuLNaJf%2BNLMdf7Ld_6mw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Michael McCandless) #15

Thanks Srinath, these are good results: basically the local SSD is much
(~2X) faster than EBS attached instances, even with higher provisioned IOPs
for EBS.

I took a quick look at a few hot threads: they seem "correct" (ES is busy
indexing and merging).

I'm not sure why you're hitting EsRejectedExecutionException if you're
"only" using 7-9 clients to submit bulk requests, but this exception is
basically harmless: it means your clients are succeeding in hitting the
capacity of the cluster, and they just have to retry the rejected request.
I wish ES had a simple bulk streaming API so clients wouldn't have to deal
with things like this.

Have you tested different numbers of documents in each bulk request?
That's another knob to play with...

Are you letting ES auto-generate the ID, or are you providing an ID?

I think it's worth testing the storage optimized instances to see how
performance compares, but those are still volatile storage right? I mean,
on boot, you lose all files on those fast "instance store" SSDs.

In general I suspect you get much faster performance from non-virtual
dedicated boxes....

Mike McCandless

http://blog.mikemccandless.com

On Fri, Jul 18, 2014 at 12:36 AM, Srinath C srinath.c@gmail.com wrote:

Hi Michael,
Did you get a chance to look at the hot_threads and iostat output?
I also tried with EBS Provisioned SSB with 4000 IOPS and with that I
was able to ingest only at around 30K per second after which there are
EsRejectedExecutionException. There were 4 elasticsearch instances of type
c3.2xlarge. CPU utilization was around 650% (out of 800). The iostat output
on the instances looks like this:

avg-cpu: %user %nice %system %iowait %steal %idle
1.66 0.00 0.14 0.15 0.04 98.01

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
xvdep1 7.86 36.95 266.05 392378 2825424
xvdf 0.03 0.20 0.00 2146 8
xvdg 0.03 0.21 0.07 2178 736
xvdj 52.53 0.33 2693.62 3506 28605624

On an instance store SSD I can go upto 48K per second with occasional
occurrences of EsRejectedExecutionException. Do you think I should try
storage optimized instances like i2.xlarge or i2.2xlarge to handle this
kind of load?

Regards,
Srinath.

On Wed, Jul 16, 2014 at 5:57 PM, Srinath C srinath.c@gmail.com wrote:

Hi Michael,
You were right. Its the IO that was the bottleneck. The data was being
written into a standard EBS device - no provisioned IOPS.

After redirecting data into the local instance store SSD storage, I
was able to get to a rate of around 50-55K without any EsRejectExceptions.
The CPU utilization too is not too high - around 200 - 400%. I have
attached the hot_threads output with this email. After running for around
1.5 hrs I could see a lot of EsRejectedExecutionException for certain
periods of time.

std_ebs_all_fine.txt - when using standard EBS. Around 25K docs per
second. No EsRejectedExecutionExceptions.
std_ebs_bulk_rejects.txt - when using standard EBS. Around 28K docs per
second. No EsRejectedExecutionExceptions.

instance_ssd_40K.txt - when using instance store SSD. Around 40K docs per
second. No EsRejectedExecutionExceptions.
instance_ssd_60K_few_rejects.txt - when using instance store SSD. Around
60K docs per second. Some EsRejectedExecutionExceptions were seen.
instance_ssd_60K_lot_of_rejects.txt - when using instance store SSD.
Around 60K docs per second. A lot of EsRejectedExecutionExceptions were
seen.

Also attaching the iostat output for these instances.

Regards,
Srinath.

On Wed, Jul 16, 2014 at 3:34 PM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

Adding to this recommendations, I would suggest running iostat tool to
monitor for any suspicious "%iowait" states while
ESRejectedExecutionExceptions do arise.

Jörg

On Wed, Jul 16, 2014 at 11:53 AM, Michael McCandless <
mike@elasticsearch.com> wrote:

Where is the index stored in your EC2 instances? It's it just an EBS
attached storage (magnetic or SSDs? provisioned IOPs or the default).

Maybe try putting the index on the SSD instance storage instead? I
realize this is not a long term solution (limited storage, and it's cleared
on reboot), but it would be a simple test to see if the IO limitations of
EBS is the bottleneck here.

Can you capture the hot threads output when you're at 200% CPU after
indexing for a while?

Mike McCandless

http://blog.mikemccandless.com

On Wed, Jul 16, 2014 at 3:03 AM, Srinath C srinath.c@gmail.com wrote:

Hi Joe/Michael,
I tried all your suggestions and found a remarkable difference in
the way elasticsearch is able to handle the bulk indexing.
Right now, I'm able to ingest at the rate of 25K per second with
the same setup. But occasionally there are still some
EsRejectedExecutionException being raised. The CPUUtilization on the
elasticsearch nodes is so low (around 200% on an 8 core system) that it
seems that something else is wrong. I have also tried to increase
queue_size but it just delays the EsRejectedExecutionException.

Any more suggestions on how to handle this?

Current setup: 4 c3.2xlarge instances of ES 1.2.2.
Current Configurations:
index.codec.bloom.load: false
index.compound_format: false
index.compound_on_flush: false
index.merge.policy.max_merge_at_once: 4
index.merge.policy.max_merge_at_once_explicit: 4
index.merge.policy.max_merged_segment: 1gb
index.merge.policy.segments_per_tier: 4
index.merge.policy.type: tiered
index.merge.scheduler.max_thread_count: 4
index.merge.scheduler.type: concurrent
index.refresh_interval: 10s
index.translog.flush_threshold_ops: 50000
index.translog.interval: 10s
index.warmer.enabled: false
indices.memory.index_buffer_size: 50%
indices.store.throttle.type: none

On Tue, Jul 15, 2014 at 6:24 PM, Srinath C srinath.c@gmail.com
wrote:

Thanks Joe, Michael and all. Really appreciate you help.
I'll try out as per your suggestions and run the tests. Will post
back on my progress.

On Tue, Jul 15, 2014 at 3:17 PM, Michael McCandless <
mike@elasticsearch.com> wrote:

First off, upgrade ES to the latest (1.2.2) release; there have been
a number of bulk indexing improvements since 1.1.

Second, disable merge IO throttling.

Third, use the default settings, but increase index.refresh_interval
to perhaps 5s, and set index.translog.flush_threshold_ops to maybe 50000:
this decreases the frequency of Lucene level commits (= filesystem fsyncs).

If possible, use SSDs: they are much faster for merging.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jul 14, 2014 at 11:03 PM, Srinath C srinath.c@gmail.com
wrote:

Each document is around 300 bytes on average so that bring up the
data rate to around 17Mb per sec.
This is running on ES version 1.1.1. I have been trying out
different values for these configurations. queue_size was increased when I
got EsRejectedException due to queue going full (default size of 50).
segments_per_tier was picked up from some articles on scaling. What would
be a reasonable value based on my data rate?

If 60K seems to be too high are there any benchmarks available for
ElasticSearch?

Thanks all for your replies.

On Monday, 14 July 2014 15:25:13 UTC+5:30, Jörg Prante wrote:

index.merge.policy.segments_per_tier: 100 and
threadpool.bulk.queue_size: 500 are extreme settings that should
be avoided as they allocate much resources. What you see by UnavailbleShardException
/ NoNodes is congestion because of such extreme values.

What ES version is this? Why don't you use the default settings?

Jörg

On Mon, Jul 14, 2014 at 4:46 AM, Srinath C srin...@gmail.com
wrote:

Hi,
I'm having a tough time to keep ElasticSearch running
healthily for even 20-30 mins in my setup. At an indexing rate of 28-36K
per second, the CPU utilization soon drops to 100% and never recovers. All
client requests fail with UnavailbleShardException or "No Nodes" exception.
The logs show warnings from "monitor.jvm" saying that GC did not free up
much of memory.

The ultimate requirement is to import data into the ES cluster
at around 60K per second on a setup explained below. The only operation
being performed is bulk import of documents. Soon the ES nodes become
unresponsive and the CPU utilization drops to 100% (from 400-500%). They
don't seem to recover even after the bulk import operations are ceased.

Any suggestions on how to tune the GC based on my requirements?
What other information would be needed to look into this?

Thanks,
Srinath.

The setup:

  • Cluster: a 4 node cluster of c3.2xlarge instances on aws-ec2.
  • Load: The only operation during this test is bulk import of
    data. The documents are small around the size of ~200-500 bytes and are
    being bulk imported into the cluster using storm.
  • Bulk Import: A total of 7-9 storm workers using a single
    BulkProcessor each to import data into the ES cluster. As seen from the
    logs, each of the worker processes are importing around 4K docs per second
    from each worker i.e. around 28-36K docs per second getting imported into
    ES.
  • JVM Args: Around 8G of heap, tried with CMS collector as well
    as G1 collector
  • ES configuration:
    • "mlockall": true
    • "threadpool.bulk.size": 20
    • "threadpool.bulk.queue_size": 500
    • "indices.memory.index_buffer_size": "50%"
    • "index.refresh_interval": "30s"
    • "index.merge.policy.segments_per_tier": 100

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHhx-
GLy1mftUtFT6eyrnuzcpNTu%3DDt3maj3YnuEdYKP4NaYWA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GLy1mftUtFT6eyrnuzcpNTu%3DDt3maj3YnuEdYKP4NaYWA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/615b3774-b3b1-4104-bd22-0a7e4d8b6d4e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/615b3774-b3b1-4104-bd22-0a7e4d8b6d4e%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/rlIuagf2_zY/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAD7smRe7xv4yxVr1G9WNJbGXW0g1jRkKut7GSU0qYAUUPuSSDQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAD7smRe7xv4yxVr1G9WNJbGXW0g1jRkKut7GSU0qYAUUPuSSDQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GK2LxUGTNOpnwH-PfnFx1Vwz8tGw7V-r50LZ5%3DUp5MPgQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GK2LxUGTNOpnwH-PfnFx1Vwz8tGw7V-r50LZ5%3DUp5MPgQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAD7smRdFrgsyFnUQqgSgvvr4MnJDtDz-sWci3Z3qmTt_FrKJ2g%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAD7smRdFrgsyFnUQqgSgvvr4MnJDtDz-sWci3Z3qmTt_FrKJ2g%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/rlIuagf2_zY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFKb5fV86CSgWJyDNgKepMi40KOHyxPqyxk38FPOEmP8g%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFKb5fV86CSgWJyDNgKepMi40KOHyxPqyxk38FPOEmP8g%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GLL5MN_6E%2B4F%3Duq_4sNwYkgdBJuLNaJf%2BNLMdf7Ld_6mw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GLL5MN_6E%2B4F%3Duq_4sNwYkgdBJuLNaJf%2BNLMdf7Ld_6mw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRc_Di68cbG__sdYQ6hgPxTD-OYoMx-UM2mDA8%3DdX3hDcA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Srinath C) #16

Yes Michael, the instance store SSD are faring much better than the EBS
ones.

There are 7-9 clients each using one bulk processor with concurrent
requests of 4 each. Does it mean that there could be a possibility of
9*4=36 requests hitting the same ES instance at an instance and when that
happens there is chance of exceeding the 50 queue_size on that instance?
Does the ES client retry on another instance when that happens?

The data is generated in real-time so as the traffic peaks up to around 40K
per second, the bulk size appears to be around 10k bulk actions and ES
seems to consume it with CPU peaking to around 500-600%.

The IDs are being auto-generated by ES itself. Would it make a difference
if I generated them? Is shard routing dependant on the IDs by default?

I think the storage optimized instance retain the data until the instance
is terminated. The data is retained on reloads. So I was wondering if I
could use those instances and setup appropriate backups to handle complete
node failures. And dedicated instances is not an option for me.

Thanks,
Srinath.

On Fri, Jul 18, 2014 at 6:20 PM, Michael McCandless mike@elasticsearch.com
wrote:

Thanks Srinath, these are good results: basically the local SSD is much
(~2X) faster than EBS attached instances, even with higher provisioned IOPs
for EBS.

I took a quick look at a few hot threads: they seem "correct" (ES is busy
indexing and merging).

I'm not sure why you're hitting EsRejectedExecutionException if you're
"only" using 7-9 clients to submit bulk requests, but this exception is
basically harmless: it means your clients are succeeding in hitting the
capacity of the cluster, and they just have to retry the rejected request.
I wish ES had a simple bulk streaming API so clients wouldn't have to deal
with things like this.

Have you tested different numbers of documents in each bulk request?
That's another knob to play with...

Are you letting ES auto-generate the ID, or are you providing an ID?

I think it's worth testing the storage optimized instances to see how
performance compares, but those are still volatile storage right? I mean,
on boot, you lose all files on those fast "instance store" SSDs.

In general I suspect you get much faster performance from non-virtual
dedicated boxes....

Mike McCandless

http://blog.mikemccandless.com

On Fri, Jul 18, 2014 at 12:36 AM, Srinath C srinath.c@gmail.com wrote:

Hi Michael,
Did you get a chance to look at the hot_threads and iostat output?
I also tried with EBS Provisioned SSB with 4000 IOPS and with that I
was able to ingest only at around 30K per second after which there are
EsRejectedExecutionException. There were 4 elasticsearch instances of type
c3.2xlarge. CPU utilization was around 650% (out of 800). The iostat output
on the instances looks like this:

avg-cpu: %user %nice %system %iowait %steal %idle
1.66 0.00 0.14 0.15 0.04 98.01

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
xvdep1 7.86 36.95 266.05 392378 2825424
xvdf 0.03 0.20 0.00 2146 8
xvdg 0.03 0.21 0.07 2178 736
xvdj 52.53 0.33 2693.62 3506 28605624

On an instance store SSD I can go upto 48K per second with occasional
occurrences of EsRejectedExecutionException. Do you think I should try
storage optimized instances like i2.xlarge or i2.2xlarge to handle this
kind of load?

Regards,
Srinath.

On Wed, Jul 16, 2014 at 5:57 PM, Srinath C srinath.c@gmail.com wrote:

Hi Michael,
You were right. Its the IO that was the bottleneck. The data was
being written into a standard EBS device - no provisioned IOPS.

After redirecting data into the local instance store SSD storage, I
was able to get to a rate of around 50-55K without any EsRejectExceptions.
The CPU utilization too is not too high - around 200 - 400%. I have
attached the hot_threads output with this email. After running for around
1.5 hrs I could see a lot of EsRejectedExecutionException for certain
periods of time.

std_ebs_all_fine.txt - when using standard EBS. Around 25K docs per
second. No EsRejectedExecutionExceptions.
std_ebs_bulk_rejects.txt - when using standard EBS. Around 28K docs per
second. No EsRejectedExecutionExceptions.

instance_ssd_40K.txt - when using instance store SSD. Around 40K docs
per second. No EsRejectedExecutionExceptions.
instance_ssd_60K_few_rejects.txt - when using instance store SSD. Around
60K docs per second. Some EsRejectedExecutionExceptions were seen.
instance_ssd_60K_lot_of_rejects.txt - when using instance store SSD.
Around 60K docs per second. A lot of EsRejectedExecutionExceptions were
seen.

Also attaching the iostat output for these instances.

Regards,
Srinath.

On Wed, Jul 16, 2014 at 3:34 PM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

Adding to this recommendations, I would suggest running iostat tool to
monitor for any suspicious "%iowait" states while
ESRejectedExecutionExceptions do arise.

Jörg

On Wed, Jul 16, 2014 at 11:53 AM, Michael McCandless <
mike@elasticsearch.com> wrote:

Where is the index stored in your EC2 instances? It's it just an EBS
attached storage (magnetic or SSDs? provisioned IOPs or the default).

Maybe try putting the index on the SSD instance storage instead? I
realize this is not a long term solution (limited storage, and it's cleared
on reboot), but it would be a simple test to see if the IO limitations of
EBS is the bottleneck here.

Can you capture the hot threads output when you're at 200% CPU after
indexing for a while?

Mike McCandless

http://blog.mikemccandless.com

On Wed, Jul 16, 2014 at 3:03 AM, Srinath C srinath.c@gmail.com
wrote:

Hi Joe/Michael,
I tried all your suggestions and found a remarkable difference in
the way elasticsearch is able to handle the bulk indexing.
Right now, I'm able to ingest at the rate of 25K per second with
the same setup. But occasionally there are still some
EsRejectedExecutionException being raised. The CPUUtilization on the
elasticsearch nodes is so low (around 200% on an 8 core system) that it
seems that something else is wrong. I have also tried to increase
queue_size but it just delays the EsRejectedExecutionException.

Any more suggestions on how to handle this?

Current setup: 4 c3.2xlarge instances of ES 1.2.2.
Current Configurations:
index.codec.bloom.load: false
index.compound_format: false
index.compound_on_flush: false
index.merge.policy.max_merge_at_once: 4
index.merge.policy.max_merge_at_once_explicit: 4
index.merge.policy.max_merged_segment: 1gb
index.merge.policy.segments_per_tier: 4
index.merge.policy.type: tiered
index.merge.scheduler.max_thread_count: 4
index.merge.scheduler.type: concurrent
index.refresh_interval: 10s
index.translog.flush_threshold_ops: 50000
index.translog.interval: 10s
index.warmer.enabled: false
indices.memory.index_buffer_size: 50%
indices.store.throttle.type: none

On Tue, Jul 15, 2014 at 6:24 PM, Srinath C srinath.c@gmail.com
wrote:

Thanks Joe, Michael and all. Really appreciate you help.
I'll try out as per your suggestions and run the tests. Will post
back on my progress.

On Tue, Jul 15, 2014 at 3:17 PM, Michael McCandless <
mike@elasticsearch.com> wrote:

First off, upgrade ES to the latest (1.2.2) release; there have
been a number of bulk indexing improvements since 1.1.

Second, disable merge IO throttling.

Third, use the default settings, but increase
index.refresh_interval to perhaps 5s, and set
index.translog.flush_threshold_ops to maybe 50000: this decreases the
frequency of Lucene level commits (= filesystem fsyncs).

If possible, use SSDs: they are much faster for merging.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jul 14, 2014 at 11:03 PM, Srinath C srinath.c@gmail.com
wrote:

Each document is around 300 bytes on average so that bring up the
data rate to around 17Mb per sec.
This is running on ES version 1.1.1. I have been trying out
different values for these configurations. queue_size was increased when I
got EsRejectedException due to queue going full (default size of 50).
segments_per_tier was picked up from some articles on scaling. What would
be a reasonable value based on my data rate?

If 60K seems to be too high are there any benchmarks available for
ElasticSearch?

Thanks all for your replies.

On Monday, 14 July 2014 15:25:13 UTC+5:30, Jörg Prante wrote:

index.merge.policy.segments_per_tier: 100 and
threadpool.bulk.queue_size: 500 are extreme settings that should
be avoided as they allocate much resources. What you see by UnavailbleShardException
/ NoNodes is congestion because of such extreme values.

What ES version is this? Why don't you use the default settings?

Jörg

On Mon, Jul 14, 2014 at 4:46 AM, Srinath C srin...@gmail.com
wrote:

Hi,
I'm having a tough time to keep ElasticSearch running
healthily for even 20-30 mins in my setup. At an indexing rate of 28-36K
per second, the CPU utilization soon drops to 100% and never recovers. All
client requests fail with UnavailbleShardException or "No Nodes" exception.
The logs show warnings from "monitor.jvm" saying that GC did not free up
much of memory.

The ultimate requirement is to import data into the ES cluster
at around 60K per second on a setup explained below. The only operation
being performed is bulk import of documents. Soon the ES nodes become
unresponsive and the CPU utilization drops to 100% (from 400-500%). They
don't seem to recover even after the bulk import operations are ceased.

Any suggestions on how to tune the GC based on my
requirements? What other information would be needed to look into this?

Thanks,
Srinath.

The setup:

  • Cluster: a 4 node cluster of c3.2xlarge instances on aws-ec2.
  • Load: The only operation during this test is bulk import of
    data. The documents are small around the size of ~200-500 bytes and are
    being bulk imported into the cluster using storm.
  • Bulk Import: A total of 7-9 storm workers using a single
    BulkProcessor each to import data into the ES cluster. As seen from the
    logs, each of the worker processes are importing around 4K docs per second
    from each worker i.e. around 28-36K docs per second getting imported into
    ES.
  • JVM Args: Around 8G of heap, tried with CMS collector as
    well as G1 collector
  • ES configuration:
    • "mlockall": true
    • "threadpool.bulk.size": 20
    • "threadpool.bulk.queue_size": 500
    • "indices.memory.index_buffer_size": "50%"
    • "index.refresh_interval": "30s"
    • "index.merge.policy.segments_per_tier": 100

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHhx-
GLy1mftUtFT6eyrnuzcpNTu%3DDt3maj3YnuEdYKP4NaYWA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GLy1mftUtFT6eyrnuzcpNTu%3DDt3maj3YnuEdYKP4NaYWA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/615b3774-b3b1-4104-bd22-0a7e4d8b6d4e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/615b3774-b3b1-4104-bd22-0a7e4d8b6d4e%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/rlIuagf2_zY/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAD7smRe7xv4yxVr1G9WNJbGXW0g1jRkKut7GSU0qYAUUPuSSDQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAD7smRe7xv4yxVr1G9WNJbGXW0g1jRkKut7GSU0qYAUUPuSSDQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GK2LxUGTNOpnwH-PfnFx1Vwz8tGw7V-r50LZ5%3DUp5MPgQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GK2LxUGTNOpnwH-PfnFx1Vwz8tGw7V-r50LZ5%3DUp5MPgQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAD7smRdFrgsyFnUQqgSgvvr4MnJDtDz-sWci3Z3qmTt_FrKJ2g%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAD7smRdFrgsyFnUQqgSgvvr4MnJDtDz-sWci3Z3qmTt_FrKJ2g%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/rlIuagf2_zY/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFKb5fV86CSgWJyDNgKepMi40KOHyxPqyxk38FPOEmP8g%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFKb5fV86CSgWJyDNgKepMi40KOHyxPqyxk38FPOEmP8g%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GLL5MN_6E%2B4F%3Duq_4sNwYkgdBJuLNaJf%2BNLMdf7Ld_6mw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GLL5MN_6E%2B4F%3Duq_4sNwYkgdBJuLNaJf%2BNLMdf7Ld_6mw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/rlIuagf2_zY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAD7smRc_Di68cbG__sdYQ6hgPxTD-OYoMx-UM2mDA8%3DdX3hDcA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAD7smRc_Di68cbG__sdYQ6hgPxTD-OYoMx-UM2mDA8%3DdX3hDcA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHhx-GLjAXZy8_U_NAiVu9e3N0BYm%3DT3%3Dzvud2a3xwdp27qHKA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #17

EsRejectedExecutionException is not harmless, the rejected documents are
dropped from the index, if they are not send again by the client. There is
no retry, also not on another instance.

If you queue up 10k actions in a single bulk request, a simple method to
take control about the queue rejections is to increase the actions per bulk
request to 15k or 20k and to decrease the maximum bulk request concurrency
as shown in the org.elasticsearch.action.bulk.BulkProcessor class.

This configuration can be balanced out for finding the "sweet spot" of your
system, by monitoring the bulk response times. The "sweet spot" is where
the response times are minimal.

Jörg

On Fri, Jul 18, 2014 at 3:26 PM, Srinath C srinath.c@gmail.com wrote:

Yes Michael, the instance store SSD are faring much better than the EBS
ones.

There are 7-9 clients each using one bulk processor with concurrent
requests of 4 each. Does it mean that there could be a possibility of
9*4=36 requests hitting the same ES instance at an instance and when that
happens there is chance of exceeding the 50 queue_size on that instance?
Does the ES client retry on another instance when that happens?

The data is generated in real-time so as the traffic peaks up to around
40K per second, the bulk size appears to be around 10k bulk actions and ES
seems to consume it with CPU peaking to around 500-600%.

The IDs are being auto-generated by ES itself. Would it make a difference
if I generated them? Is shard routing dependant on the IDs by default?

I think the storage optimized instance retain the data until the instance
is terminated. The data is retained on reloads. So I was wondering if I
could use those instances and setup appropriate backups to handle complete
node failures. And dedicated instances is not an option for me.

Thanks,
Srinath.

On Fri, Jul 18, 2014 at 6:20 PM, Michael McCandless <
mike@elasticsearch.com> wrote:

Thanks Srinath, these are good results: basically the local SSD is much
(~2X) faster than EBS attached instances, even with higher provisioned IOPs
for EBS.

I took a quick look at a few hot threads: they seem "correct" (ES is busy
indexing and merging).

I'm not sure why you're hitting EsRejectedExecutionException if you're
"only" using 7-9 clients to submit bulk requests, but this exception is
basically harmless: it means your clients are succeeding in hitting the
capacity of the cluster, and they just have to retry the rejected request.
I wish ES had a simple bulk streaming API so clients wouldn't have to deal
with things like this.

Have you tested different numbers of documents in each bulk request?
That's another knob to play with...

Are you letting ES auto-generate the ID, or are you providing an ID?

I think it's worth testing the storage optimized instances to see how
performance compares, but those are still volatile storage right? I mean,
on boot, you lose all files on those fast "instance store" SSDs.

In general I suspect you get much faster performance from non-virtual
dedicated boxes....

Mike McCandless

http://blog.mikemccandless.com

On Fri, Jul 18, 2014 at 12:36 AM, Srinath C srinath.c@gmail.com wrote:

Hi Michael,
Did you get a chance to look at the hot_threads and iostat output?
I also tried with EBS Provisioned SSB with 4000 IOPS and with that I
was able to ingest only at around 30K per second after which there are
EsRejectedExecutionException. There were 4 elasticsearch instances of type
c3.2xlarge. CPU utilization was around 650% (out of 800). The iostat output
on the instances looks like this:

avg-cpu: %user %nice %system %iowait %steal %idle
1.66 0.00 0.14 0.15 0.04 98.01

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
xvdep1 7.86 36.95 266.05 392378 2825424
xvdf 0.03 0.20 0.00 2146 8
xvdg 0.03 0.21 0.07 2178 736
xvdj 52.53 0.33 2693.62 3506 28605624

On an instance store SSD I can go upto 48K per second with occasional
occurrences of EsRejectedExecutionException. Do you think I should try
storage optimized instances like i2.xlarge or i2.2xlarge to handle this
kind of load?

Regards,
Srinath.

On Wed, Jul 16, 2014 at 5:57 PM, Srinath C srinath.c@gmail.com wrote:

Hi Michael,
You were right. Its the IO that was the bottleneck. The data was
being written into a standard EBS device - no provisioned IOPS.

After redirecting data into the local instance store SSD storage, I
was able to get to a rate of around 50-55K without any EsRejectExceptions.
The CPU utilization too is not too high - around 200 - 400%. I have
attached the hot_threads output with this email. After running for around
1.5 hrs I could see a lot of EsRejectedExecutionException for certain
periods of time.

std_ebs_all_fine.txt - when using standard EBS. Around 25K docs per
second. No EsRejectedExecutionExceptions.
std_ebs_bulk_rejects.txt - when using standard EBS. Around 28K docs per
second. No EsRejectedExecutionExceptions.

instance_ssd_40K.txt - when using instance store SSD. Around 40K docs
per second. No EsRejectedExecutionExceptions.
instance_ssd_60K_few_rejects.txt - when using instance store SSD.
Around 60K docs per second. Some EsRejectedExecutionExceptions were seen.
instance_ssd_60K_lot_of_rejects.txt - when using instance store SSD.
Around 60K docs per second. A lot of EsRejectedExecutionExceptions were
seen.

Also attaching the iostat output for these instances.

Regards,
Srinath.

On Wed, Jul 16, 2014 at 3:34 PM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

Adding to this recommendations, I would suggest running iostat tool to
monitor for any suspicious "%iowait" states while
ESRejectedExecutionExceptions do arise.

Jörg

On Wed, Jul 16, 2014 at 11:53 AM, Michael McCandless <
mike@elasticsearch.com> wrote:

Where is the index stored in your EC2 instances? It's it just an EBS
attached storage (magnetic or SSDs? provisioned IOPs or the default).

Maybe try putting the index on the SSD instance storage instead? I
realize this is not a long term solution (limited storage, and it's cleared
on reboot), but it would be a simple test to see if the IO limitations of
EBS is the bottleneck here.

Can you capture the hot threads output when you're at 200% CPU after
indexing for a while?

Mike McCandless

http://blog.mikemccandless.com

On Wed, Jul 16, 2014 at 3:03 AM, Srinath C srinath.c@gmail.com
wrote:

Hi Joe/Michael,
I tried all your suggestions and found a remarkable difference in
the way elasticsearch is able to handle the bulk indexing.
Right now, I'm able to ingest at the rate of 25K per second with
the same setup. But occasionally there are still some
EsRejectedExecutionException being raised. The CPUUtilization on the
elasticsearch nodes is so low (around 200% on an 8 core system) that it
seems that something else is wrong. I have also tried to increase
queue_size but it just delays the EsRejectedExecutionException.

Any more suggestions on how to handle this?

Current setup: 4 c3.2xlarge instances of ES 1.2.2.
Current Configurations:
index.codec.bloom.load: false
index.compound_format: false
index.compound_on_flush: false
index.merge.policy.max_merge_at_once: 4
index.merge.policy.max_merge_at_once_explicit: 4
index.merge.policy.max_merged_segment: 1gb
index.merge.policy.segments_per_tier: 4
index.merge.policy.type: tiered
index.merge.scheduler.max_thread_count: 4
index.merge.scheduler.type: concurrent
index.refresh_interval: 10s
index.translog.flush_threshold_ops: 50000
index.translog.interval: 10s
index.warmer.enabled: false
indices.memory.index_buffer_size: 50%
indices.store.throttle.type: none

On Tue, Jul 15, 2014 at 6:24 PM, Srinath C srinath.c@gmail.com
wrote:

Thanks Joe, Michael and all. Really appreciate you help.
I'll try out as per your suggestions and run the tests. Will post
back on my progress.

On Tue, Jul 15, 2014 at 3:17 PM, Michael McCandless <
mike@elasticsearch.com> wrote:

First off, upgrade ES to the latest (1.2.2) release; there have
been a number of bulk indexing improvements since 1.1.

Second, disable merge IO throttling.

Third, use the default settings, but increase
index.refresh_interval to perhaps 5s, and set
index.translog.flush_threshold_ops to maybe 50000: this decreases the
frequency of Lucene level commits (= filesystem fsyncs).

If possible, use SSDs: they are much faster for merging.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jul 14, 2014 at 11:03 PM, Srinath C srinath.c@gmail.com
wrote:

Each document is around 300 bytes on average so that bring up the
data rate to around 17Mb per sec.
This is running on ES version 1.1.1. I have been trying out
different values for these configurations. queue_size was increased when I
got EsRejectedException due to queue going full (default size of 50).
segments_per_tier was picked up from some articles on scaling. What would
be a reasonable value based on my data rate?

If 60K seems to be too high are there any benchmarks available
for ElasticSearch?

Thanks all for your replies.

On Monday, 14 July 2014 15:25:13 UTC+5:30, Jörg Prante wrote:

index.merge.policy.segments_per_tier: 100 and
threadpool.bulk.queue_size: 500 are extreme settings that
should be avoided as they allocate much resources. What you see by UnavailbleShardException
/ NoNodes is congestion because of such extreme values.

What ES version is this? Why don't you use the default settings?

Jörg

On Mon, Jul 14, 2014 at 4:46 AM, Srinath C srin...@gmail.com
wrote:

Hi,
I'm having a tough time to keep ElasticSearch running
healthily for even 20-30 mins in my setup. At an indexing rate of 28-36K
per second, the CPU utilization soon drops to 100% and never recovers. All
client requests fail with UnavailbleShardException or "No Nodes" exception.
The logs show warnings from "monitor.jvm" saying that GC did not free up
much of memory.

The ultimate requirement is to import data into the ES cluster
at around 60K per second on a setup explained below. The only operation
being performed is bulk import of documents. Soon the ES nodes become
unresponsive and the CPU utilization drops to 100% (from 400-500%). They
don't seem to recover even after the bulk import operations are ceased.

Any suggestions on how to tune the GC based on my
requirements? What other information would be needed to look into this?

Thanks,
Srinath.

The setup:

  • Cluster: a 4 node cluster of c3.2xlarge instances on
    aws-ec2.
  • Load: The only operation during this test is bulk import of
    data. The documents are small around the size of ~200-500 bytes and are
    being bulk imported into the cluster using storm.
  • Bulk Import: A total of 7-9 storm workers using a single
    BulkProcessor each to import data into the ES cluster. As seen from the
    logs, each of the worker processes are importing around 4K docs per second
    from each worker i.e. around 28-36K docs per second getting imported into
    ES.
  • JVM Args: Around 8G of heap, tried with CMS collector as
    well as G1 collector
  • ES configuration:
    • "mlockall": true
    • "threadpool.bulk.size": 20
    • "threadpool.bulk.queue_size": 500
    • "indices.memory.index_buffer_size": "50%"
    • "index.refresh_interval": "30s"
    • "index.merge.policy.segments_per_tier": 100

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHhx-
GLy1mftUtFT6eyrnuzcpNTu%3DDt3maj3YnuEdYKP4NaYWA%
40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GLy1mftUtFT6eyrnuzcpNTu%3DDt3maj3YnuEdYKP4NaYWA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/615b3774-b3b1-4104-bd22-0a7e4d8b6d4e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/615b3774-b3b1-4104-bd22-0a7e4d8b6d4e%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/rlIuagf2_zY/unsubscribe
.
To unsubscribe from this group and all its topics, send an email
to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAD7smRe7xv4yxVr1G9WNJbGXW0g1jRkKut7GSU0qYAUUPuSSDQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAD7smRe7xv4yxVr1G9WNJbGXW0g1jRkKut7GSU0qYAUUPuSSDQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GK2LxUGTNOpnwH-PfnFx1Vwz8tGw7V-r50LZ5%3DUp5MPgQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GK2LxUGTNOpnwH-PfnFx1Vwz8tGw7V-r50LZ5%3DUp5MPgQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAD7smRdFrgsyFnUQqgSgvvr4MnJDtDz-sWci3Z3qmTt_FrKJ2g%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAD7smRdFrgsyFnUQqgSgvvr4MnJDtDz-sWci3Z3qmTt_FrKJ2g%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/rlIuagf2_zY/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFKb5fV86CSgWJyDNgKepMi40KOHyxPqyxk38FPOEmP8g%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFKb5fV86CSgWJyDNgKepMi40KOHyxPqyxk38FPOEmP8g%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GLL5MN_6E%2B4F%3Duq_4sNwYkgdBJuLNaJf%2BNLMdf7Ld_6mw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GLL5MN_6E%2B4F%3Duq_4sNwYkgdBJuLNaJf%2BNLMdf7Ld_6mw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/rlIuagf2_zY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAD7smRc_Di68cbG__sdYQ6hgPxTD-OYoMx-UM2mDA8%3DdX3hDcA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAD7smRc_Di68cbG__sdYQ6hgPxTD-OYoMx-UM2mDA8%3DdX3hDcA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GLjAXZy8_U_NAiVu9e3N0BYm%3DT3%3Dzvud2a3xwdp27qHKA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GLjAXZy8_U_NAiVu9e3N0BYm%3DT3%3Dzvud2a3xwdp27qHKA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHYyUL6XDc1qFpCSBw3uQ1%2BqVY1vBtUMZKFvU9nURPnXQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Michael McCandless) #18

On Fri, Jul 18, 2014 at 9:26 AM, Srinath C srinath.c@gmail.com wrote:

Yes Michael, the instance store SSD are faring much better than the EBS
ones.

In your EBS tests, were those SSDs attached via EBS? Or magnetic?

There are 7-9 clients each using one bulk processor with concurrent
requests of 4 each. Does it mean that there could be a possibility of
9*4=36 requests hitting the same ES instance at an instance and when that
happens there is chance of exceeding the 50 queue_size on that instance?
Does the ES client retry on another instance when that happens?

Yes, up to 36 concurrent requests; from looking at the docs (
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-threadpool.html
) it looks like the queue size (to hold the backlog of incoming indexing
requests until a thread frees up to service them) defaults to -1
(unbounded) ... so now I'm not sure why you're even hitting this exception,
but I don't have a lot of experience here.

The data is generated in real-time so as the traffic peaks up to around
40K per second, the bulk size appears to be around 10k bulk actions and ES
seems to consume it with CPU peaking to around 500-600%.

You mean you batch up 10K indexing operations before submitting the bulk
request to ES?

The IDs are being auto-generated by ES itself. Would it make a difference
if I generated them? Is shard routing dependant on the IDs by default?

It's best to let ES auto-generate: it optimizes this case.

I think the storage optimized instance retain the data until the instance
is terminated. The data is retained on reloads.

Ahh in fact this is true for all instance storage (not just storage
optimized instances); I was just confused before. So then, yes, I think
storage optimized instance is well worth testing, assuming your index can
fit into that storage.

So I was wondering if I could use those instances and setup appropriate
backups to handle complete node failures. And dedicated instances is not an
option for me.

Well, even with the storage optimized instance, those local SSDs are being
shared with other virtual machines on that same box which will taint the
results. But likely it will give the best performance.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRd5bsD9cFdH9W8dxPnB5gvnBR6rA92aChOyT60JdHSz7g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Srinath C) #19

Thanks Jörg. Will try to batch more documents per bulk request and have
fewer bulk requests to the ES instances by tuning the concurrent requests.
Lets see how that fares.

On Fri, Jul 18, 2014 at 7:07 PM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

EsRejectedExecutionException is not harmless, the rejected documents are
dropped from the index, if they are not send again by the client. There is
no retry, also not on another instance.

If you queue up 10k actions in a single bulk request, a simple method to
take control about the queue rejections is to increase the actions per bulk
request to 15k or 20k and to decrease the maximum bulk request concurrency
as shown in the org.elasticsearch.action.bulk.BulkProcessor class.

This configuration can be balanced out for finding the "sweet spot" of
your system, by monitoring the bulk response times. The "sweet spot" is
where the response times are minimal.

Jörg

On Fri, Jul 18, 2014 at 3:26 PM, Srinath C srinath.c@gmail.com wrote:

Yes Michael, the instance store SSD are faring much better than the EBS
ones.

There are 7-9 clients each using one bulk processor with concurrent
requests of 4 each. Does it mean that there could be a possibility of
9*4=36 requests hitting the same ES instance at an instance and when that
happens there is chance of exceeding the 50 queue_size on that instance?
Does the ES client retry on another instance when that happens?

The data is generated in real-time so as the traffic peaks up to around
40K per second, the bulk size appears to be around 10k bulk actions and ES
seems to consume it with CPU peaking to around 500-600%.

The IDs are being auto-generated by ES itself. Would it make a difference
if I generated them? Is shard routing dependant on the IDs by default?

I think the storage optimized instance retain the data until the instance
is terminated. The data is retained on reloads. So I was wondering if I
could use those instances and setup appropriate backups to handle complete
node failures. And dedicated instances is not an option for me.

Thanks,
Srinath.

On Fri, Jul 18, 2014 at 6:20 PM, Michael McCandless <
mike@elasticsearch.com> wrote:

Thanks Srinath, these are good results: basically the local SSD is
much (~2X) faster than EBS attached instances, even with higher provisioned
IOPs for EBS.

I took a quick look at a few hot threads: they seem "correct" (ES is
busy indexing and merging).

I'm not sure why you're hitting EsRejectedExecutionException if you're
"only" using 7-9 clients to submit bulk requests, but this exception is
basically harmless: it means your clients are succeeding in hitting the
capacity of the cluster, and they just have to retry the rejected request.
I wish ES had a simple bulk streaming API so clients wouldn't have to deal
with things like this.

Have you tested different numbers of documents in each bulk request?
That's another knob to play with...

Are you letting ES auto-generate the ID, or are you providing an ID?

I think it's worth testing the storage optimized instances to see how
performance compares, but those are still volatile storage right? I mean,
on boot, you lose all files on those fast "instance store" SSDs.

In general I suspect you get much faster performance from non-virtual
dedicated boxes....

Mike McCandless

http://blog.mikemccandless.com

On Fri, Jul 18, 2014 at 12:36 AM, Srinath C srinath.c@gmail.com wrote:

Hi Michael,
Did you get a chance to look at the hot_threads and iostat output?
I also tried with EBS Provisioned SSB with 4000 IOPS and with that I
was able to ingest only at around 30K per second after which there are
EsRejectedExecutionException. There were 4 elasticsearch instances of type
c3.2xlarge. CPU utilization was around 650% (out of 800). The iostat output
on the instances looks like this:

avg-cpu: %user %nice %system %iowait %steal %idle
1.66 0.00 0.14 0.15 0.04 98.01

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
xvdep1 7.86 36.95 266.05 392378 2825424
xvdf 0.03 0.20 0.00 2146 8
xvdg 0.03 0.21 0.07 2178 736
xvdj 52.53 0.33 2693.62 3506 28605624

On an instance store SSD I can go upto 48K per second with occasional
occurrences of EsRejectedExecutionException. Do you think I should try
storage optimized instances like i2.xlarge or i2.2xlarge to handle this
kind of load?

Regards,
Srinath.

On Wed, Jul 16, 2014 at 5:57 PM, Srinath C srinath.c@gmail.com wrote:

Hi Michael,
You were right. Its the IO that was the bottleneck. The data was
being written into a standard EBS device - no provisioned IOPS.

After redirecting data into the local instance store SSD storage, I
was able to get to a rate of around 50-55K without any EsRejectExceptions.
The CPU utilization too is not too high - around 200 - 400%. I have
attached the hot_threads output with this email. After running for around
1.5 hrs I could see a lot of EsRejectedExecutionException for certain
periods of time.

std_ebs_all_fine.txt - when using standard EBS. Around 25K docs per
second. No EsRejectedExecutionExceptions.
std_ebs_bulk_rejects.txt - when using standard EBS. Around 28K docs
per second. No EsRejectedExecutionExceptions.

instance_ssd_40K.txt - when using instance store SSD. Around 40K docs
per second. No EsRejectedExecutionExceptions.
instance_ssd_60K_few_rejects.txt - when using instance store SSD.
Around 60K docs per second. Some EsRejectedExecutionExceptions were seen.
instance_ssd_60K_lot_of_rejects.txt - when using instance store SSD.
Around 60K docs per second. A lot of EsRejectedExecutionExceptions were
seen.

Also attaching the iostat output for these instances.

Regards,
Srinath.

On Wed, Jul 16, 2014 at 3:34 PM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

Adding to this recommendations, I would suggest running iostat tool
to monitor for any suspicious "%iowait" states while
ESRejectedExecutionExceptions do arise.

Jörg

On Wed, Jul 16, 2014 at 11:53 AM, Michael McCandless <
mike@elasticsearch.com> wrote:

Where is the index stored in your EC2 instances? It's it just an
EBS attached storage (magnetic or SSDs? provisioned IOPs or the default).

Maybe try putting the index on the SSD instance storage instead? I
realize this is not a long term solution (limited storage, and it's cleared
on reboot), but it would be a simple test to see if the IO limitations of
EBS is the bottleneck here.

Can you capture the hot threads output when you're at 200% CPU after
indexing for a while?

Mike McCandless

http://blog.mikemccandless.com

On Wed, Jul 16, 2014 at 3:03 AM, Srinath C srinath.c@gmail.com
wrote:

Hi Joe/Michael,
I tried all your suggestions and found a remarkable difference
in the way elasticsearch is able to handle the bulk indexing.
Right now, I'm able to ingest at the rate of 25K per second with
the same setup. But occasionally there are still some
EsRejectedExecutionException being raised. The CPUUtilization on the
elasticsearch nodes is so low (around 200% on an 8 core system) that it
seems that something else is wrong. I have also tried to increase
queue_size but it just delays the EsRejectedExecutionException.

Any more suggestions on how to handle this?

Current setup: 4 c3.2xlarge instances of ES 1.2.2.
Current Configurations:
index.codec.bloom.load: false
index.compound_format: false
index.compound_on_flush: false
index.merge.policy.max_merge_at_once: 4
index.merge.policy.max_merge_at_once_explicit: 4
index.merge.policy.max_merged_segment: 1gb
index.merge.policy.segments_per_tier: 4
index.merge.policy.type: tiered
index.merge.scheduler.max_thread_count: 4
index.merge.scheduler.type: concurrent
index.refresh_interval: 10s
index.translog.flush_threshold_ops: 50000
index.translog.interval: 10s
index.warmer.enabled: false
indices.memory.index_buffer_size: 50%
indices.store.throttle.type: none

On Tue, Jul 15, 2014 at 6:24 PM, Srinath C srinath.c@gmail.com
wrote:

Thanks Joe, Michael and all. Really appreciate you help.
I'll try out as per your suggestions and run the tests. Will post
back on my progress.

On Tue, Jul 15, 2014 at 3:17 PM, Michael McCandless <
mike@elasticsearch.com> wrote:

First off, upgrade ES to the latest (1.2.2) release; there have
been a number of bulk indexing improvements since 1.1.

Second, disable merge IO throttling.

Third, use the default settings, but increase
index.refresh_interval to perhaps 5s, and set
index.translog.flush_threshold_ops to maybe 50000: this decreases the
frequency of Lucene level commits (= filesystem fsyncs).

If possible, use SSDs: they are much faster for merging.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jul 14, 2014 at 11:03 PM, Srinath C srinath.c@gmail.com
wrote:

Each document is around 300 bytes on average so that bring up
the data rate to around 17Mb per sec.
This is running on ES version 1.1.1. I have been trying out
different values for these configurations. queue_size was increased when I
got EsRejectedException due to queue going full (default size of 50).
segments_per_tier was picked up from some articles on scaling. What would
be a reasonable value based on my data rate?

If 60K seems to be too high are there any benchmarks available
for ElasticSearch?

Thanks all for your replies.

On Monday, 14 July 2014 15:25:13 UTC+5:30, Jörg Prante wrote:

index.merge.policy.segments_per_tier: 100 and
threadpool.bulk.queue_size: 500 are extreme settings that
should be avoided as they allocate much resources. What you see by UnavailbleShardException
/ NoNodes is congestion because of such extreme values.

What ES version is this? Why don't you use the default settings?

Jörg

On Mon, Jul 14, 2014 at 4:46 AM, Srinath C srin...@gmail.com
wrote:

Hi,
I'm having a tough time to keep ElasticSearch running
healthily for even 20-30 mins in my setup. At an indexing rate of 28-36K
per second, the CPU utilization soon drops to 100% and never recovers. All
client requests fail with UnavailbleShardException or "No Nodes" exception.
The logs show warnings from "monitor.jvm" saying that GC did not free up
much of memory.

The ultimate requirement is to import data into the ES
cluster at around 60K per second on a setup explained below. The only
operation being performed is bulk import of documents. Soon the ES nodes
become unresponsive and the CPU utilization drops to 100% (from 400-500%).
They don't seem to recover even after the bulk import operations are ceased.

Any suggestions on how to tune the GC based on my
requirements? What other information would be needed to look into this?

Thanks,
Srinath.

The setup:

  • Cluster: a 4 node cluster of c3.2xlarge instances on
    aws-ec2.
  • Load: The only operation during this test is bulk import
    of data. The documents are small around the size of ~200-500 bytes and are
    being bulk imported into the cluster using storm.
  • Bulk Import: A total of 7-9 storm workers using a single
    BulkProcessor each to import data into the ES cluster. As seen from the
    logs, each of the worker processes are importing around 4K docs per second
    from each worker i.e. around 28-36K docs per second getting imported into
    ES.
  • JVM Args: Around 8G of heap, tried with CMS collector as
    well as G1 collector
  • ES configuration:
    • "mlockall": true
    • "threadpool.bulk.size": 20
    • "threadpool.bulk.queue_size": 500
    • "indices.memory.index_buffer_size": "50%"
    • "index.refresh_interval": "30s"
    • "index.merge.policy.segments_per_tier": 100

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHhx-
GLy1mftUtFT6eyrnuzcpNTu%3DDt3maj3YnuEdYKP4NaYWA%
40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GLy1mftUtFT6eyrnuzcpNTu%3DDt3maj3YnuEdYKP4NaYWA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the
Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/615b3774-b3b1-4104-bd22-0a7e4d8b6d4e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/615b3774-b3b1-4104-bd22-0a7e4d8b6d4e%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic
in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/rlIuagf2_zY/unsubscribe
.
To unsubscribe from this group and all its topics, send an email
to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAD7smRe7xv4yxVr1G9WNJbGXW0g1jRkKut7GSU0qYAUUPuSSDQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAD7smRe7xv4yxVr1G9WNJbGXW0g1jRkKut7GSU0qYAUUPuSSDQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GK2LxUGTNOpnwH-PfnFx1Vwz8tGw7V-r50LZ5%3DUp5MPgQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GK2LxUGTNOpnwH-PfnFx1Vwz8tGw7V-r50LZ5%3DUp5MPgQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAD7smRdFrgsyFnUQqgSgvvr4MnJDtDz-sWci3Z3qmTt_FrKJ2g%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAD7smRdFrgsyFnUQqgSgvvr4MnJDtDz-sWci3Z3qmTt_FrKJ2g%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/rlIuagf2_zY/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFKb5fV86CSgWJyDNgKepMi40KOHyxPqyxk38FPOEmP8g%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFKb5fV86CSgWJyDNgKepMi40KOHyxPqyxk38FPOEmP8g%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GLL5MN_6E%2B4F%3Duq_4sNwYkgdBJuLNaJf%2BNLMdf7Ld_6mw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GLL5MN_6E%2B4F%3Duq_4sNwYkgdBJuLNaJf%2BNLMdf7Ld_6mw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/rlIuagf2_zY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAD7smRc_Di68cbG__sdYQ6hgPxTD-OYoMx-UM2mDA8%3DdX3hDcA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAD7smRc_Di68cbG__sdYQ6hgPxTD-OYoMx-UM2mDA8%3DdX3hDcA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GLjAXZy8_U_NAiVu9e3N0BYm%3DT3%3Dzvud2a3xwdp27qHKA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAHhx-GLjAXZy8_U_NAiVu9e3N0BYm%3DT3%3Dzvud2a3xwdp27qHKA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/rlIuagf2_zY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHYyUL6XDc1qFpCSBw3uQ1%2BqVY1vBtUMZKFvU9nURPnXQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHYyUL6XDc1qFpCSBw3uQ1%2BqVY1vBtUMZKFvU9nURPnXQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHhx-G%2B%2BnqMJ_kxUODjTkJa8qjBTdjMkJRmvnrswnp8Hu8VD7Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Srinath C) #20

Michael see inline for my replies...

On Fri, Jul 18, 2014 at 7:36 PM, Michael McCandless mike@elasticsearch.com
wrote:

On Fri, Jul 18, 2014 at 9:26 AM, Srinath C srinath.c@gmail.com wrote:

Yes Michael, the instance store SSD are faring much better than the EBS
ones.

In your EBS tests, were those SSDs attached via EBS? Or magnetic?

[Srinath:] Yes, I tried both General purpose SSD as well as Provisioned SSD
not the magnetic EBS. I think the max IOPS offered is 4000. Any idea how I
can estimate what IOPS I need?

There are 7-9 clients each using one bulk processor with concurrent
requests of 4 each. Does it mean that there could be a possibility of
9*4=36 requests hitting the same ES instance at an instance and when that
happens there is chance of exceeding the 50 queue_size on that instance?
Does the ES client retry on another instance when that happens?

Yes, up to 36 concurrent requests; from looking at the docs (
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-threadpool.html
) it looks like the queue size (to hold the backlog of incoming indexing
requests until a thread frees up to service them) defaults to -1
(unbounded) ... so now I'm not sure why you're even hitting this exception,
but I don't have a lot of experience here.

The data is generated in real-time so as the traffic peaks up to around
40K per second, the bulk size appears to be around 10k bulk actions and ES
seems to consume it with CPU peaking to around 500-600%.

You mean you batch up 10K indexing operations before submitting the bulk
request to ES?

[Srinath:] The batching is done based on number of operations as well as
time. Documents are continuously added into the bulk processor which is
configured with max actions as 25k and flush interval of 5s.

The IDs are being auto-generated by ES itself. Would it make a difference
if I generated them? Is shard routing dependant on the IDs by default?

It's best to let ES auto-generate: it optimizes this case.

[Srinath:] Ok

I think the storage optimized instance retain the data until the instance
is terminated. The data is retained on reloads.

Ahh in fact this is true for all instance storage (not just storage
optimized instances); I was just confused before. So then, yes, I think
storage optimized instance is well worth testing, assuming your index can
fit into that storage.

[Srinath:] Yes. Will git a try. But they are pretty expensive. Almost four
times the cost of compute optimized instances. i2.xlarge comes with 1X800GB
of SSD and should be good enough but is low on compute. So i2.2xlarge seems
imminent but really expensive.

So I was wondering if I could use those instances and setup appropriate
backups to handle complete node failures. And dedicated instances is not an
option for me.

Well, even with the storage optimized instance, those local SSDs are being
shared with other virtual machines on that same box which will taint the
results. But likely it will give the best performance.

[Srinath:] Sure, will try it out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/rlIuagf2_zY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAD7smRd5bsD9cFdH9W8dxPnB5gvnBR6rA92aChOyT60JdHSz7g%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAD7smRd5bsD9cFdH9W8dxPnB5gvnBR6rA92aChOyT60JdHSz7g%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHhx-GJaVTFTfSxm0Qp0ESLxDzzi1DbnZCTP6CMtb5y28R3aWg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.