Elasticsearch Load Test goes OutOfMemory

Hi all,

I need some help here. I started a load test for Elasticsearch before using
that in production environment. I have three EC2 instances that are
configured in following manner which creates a Elasticsearch cluster.

All three machines has the following same hardware configurations.

32GB RAM
160GB SSD hard disk
8 core CPU

Machine 01
Elasticsearch server (16GB heap)
Elasticsearch Java client (Who generates a continues load and report to ES

  • 4GB heap)

Machine 02
Elasticsearch server (16GB heap)
Elasticsearch Java client (Who generates a continues load and report to
ES - 4GB heap)

Machine 03
Elasticsearch server (16GB heap)
Elasticsearch Java client (Who queries from ES continuously - 1GB heap)

Note that the two clients together generates around 20K records per second
and report them as bulks with average size of 25. The other client queries
only one query per second. My document has the following format.

{
"_index": "my_index",
"_type": "my_type",
"_id": "7334236299916134105",
"_score": 3.6111107,
"_source": {
"long_1": 96186289301793,
"long_2": 7334236299916134000,
"string_1": "random_string",
"long_3": 96186289301793,
"string_2": "random_string",
"string_3": "random_string",
"string_4": "random_string",
"string_5": "random_string",
"long_4": 5457314198948537000
}
}

The problem is, after few minutes, Elasticsearch reports errors in the logs
like this.

[2015-02-24 08:03:58,070][ERROR][marvel.agent.exporter ] [Gateway]
create failure (index:[.marvel-2015.02.24] type: [cluster_stats]):
RemoteTransportException[[Marvel
Girl][inet[/10.167.199.140:9300]][bulk/shard]]; nested:
EsRejectedExecutionException[rejected execution (queue capacity 50) on
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1@76dbf01];

[2015-02-25 04:23:36,459][ERROR][marvel.agent.exporter ] [Wildside]
create failure (index:[.marvel-2015.02.25] type: [index_stats]):
UnavailableShardsException[[.marvel-2015.02.25][0] [2] shardIt, [0] active
: Timeout waiting for [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@2e7693b7]

Note that this error happens for different indices and different types.

Again after few minutes, Elasticsearch clients get
NoNodeAvailableException. I hope that is because Elasticsearch cluster
malfunctioning due to above errors. But eventually the clients get
"java.lang.OutOfMemoryError: GC overhead limit exceeded" error.

I did some profiling and found out that increasing
the org.elasticsearch.action.index.IndexRequest instances is the cause for
this OutOfMemory error. I tried even with "index.store.type: memory" and it
seems still the Elasticsearch cluster cannot build the indices to the
required rate.

Please point out any tuning parameters or any method to get rid of these
issues. Or please explain a different way to report and query this amount
of load.

Thanks
Malaka

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/35a29ca5-02f6-4fe9-8600-2cdb91c519cf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

If you are getting queue capacity rejections then you are over working your
cluster. Are you using the bulk API for your tests?
How much data is in your cluster when you get OOM?

On 25 February 2015 at 16:28, Malaka Gallage mpgallage@gmail.com wrote:

Hi all,

I need some help here. I started a load test for Elasticsearch before
using that in production environment. I have three EC2 instances that are
configured in following manner which creates a Elasticsearch cluster.

All three machines has the following same hardware configurations.

32GB RAM
160GB SSD hard disk
8 core CPU

Machine 01
Elasticsearch server (16GB heap)
Elasticsearch Java client (Who generates a continues load and report to ES

  • 4GB heap)

Machine 02
Elasticsearch server (16GB heap)
Elasticsearch Java client (Who generates a continues load and report to
ES - 4GB heap)

Machine 03
Elasticsearch server (16GB heap)
Elasticsearch Java client (Who queries from ES continuously - 1GB heap)

Note that the two clients together generates around 20K records per second
and report them as bulks with average size of 25. The other client queries
only one query per second. My document has the following format.

{
"_index": "my_index",
"_type": "my_type",
"_id": "7334236299916134105",
"_score": 3.6111107,
"_source": {
"long_1": 96186289301793,
"long_2": 7334236299916134000,
"string_1": "random_string",
"long_3": 96186289301793,
"string_2": "random_string",
"string_3": "random_string",
"string_4": "random_string",
"string_5": "random_string",
"long_4": 5457314198948537000
}
}

The problem is, after few minutes, Elasticsearch reports errors in the
logs like this.

[2015-02-24 08:03:58,070][ERROR][marvel.agent.exporter ] [Gateway]
create failure (index:[.marvel-2015.02.24] type: [cluster_stats]):
RemoteTransportException[[Marvel Girl][inet[/10.167.199.140:9300]][bulk/shard]];
nested: EsRejectedExecutionException[rejected execution (queue capacity 50)
on
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1@76dbf01
];

[2015-02-25 04:23:36,459][ERROR][marvel.agent.exporter ] [Wildside]
create failure (index:[.marvel-2015.02.25] type: [index_stats]):
UnavailableShardsException[[.marvel-2015.02.25][0] [2] shardIt, [0] active
: Timeout waiting for [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@2e7693b7]

Note that this error happens for different indices and different types.

Again after few minutes, Elasticsearch clients get
NoNodeAvailableException. I hope that is because Elasticsearch cluster
malfunctioning due to above errors. But eventually the clients get
"java.lang.OutOfMemoryError: GC overhead limit exceeded" error.

I did some profiling and found out that increasing
the org.elasticsearch.action.index.IndexRequest instances is the cause for
this OutOfMemory error. I tried even with "index.store.type: memory" and it
seems still the Elasticsearch cluster cannot build the indices to the
required rate.

Please point out any tuning parameters or any method to get rid of these
issues. Or please explain a different way to report and query this amount
of load.

Thanks
Malaka

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/35a29ca5-02f6-4fe9-8600-2cdb91c519cf%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/35a29ca5-02f6-4fe9-8600-2cdb91c519cf%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8xOtuTTo4h5WfN7Q2G87yxt3KoODHOXbJ1F6PfOd7CNg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi Mark,

Yes I'm using bulk API for the tests. Usually OOM error happens when the
cluster has around 30 million records. Is there anyway to tune the ES
cluster to perform better?

Thanks
Malaka

On Wednesday, February 25, 2015 at 12:25:52 PM UTC+5:30, Mark Walkom wrote:

If you are getting queue capacity rejections then you are over working
your cluster. Are you using the bulk API for your tests?
How much data is in your cluster when you get OOM?

On 25 February 2015 at 16:28, Malaka Gallage <mpga...@gmail.com
<javascript:>> wrote:

Hi all,

I need some help here. I started a load test for Elasticsearch before
using that in production environment. I have three EC2 instances that are
configured in following manner which creates a Elasticsearch cluster.

All three machines has the following same hardware configurations.

32GB RAM
160GB SSD hard disk
8 core CPU

Machine 01
Elasticsearch server (16GB heap)
Elasticsearch Java client (Who generates a continues load and report to
ES - 4GB heap)

Machine 02
Elasticsearch server (16GB heap)
Elasticsearch Java client (Who generates a continues load and report to
ES - 4GB heap)

Machine 03
Elasticsearch server (16GB heap)
Elasticsearch Java client (Who queries from ES continuously - 1GB heap)

Note that the two clients together generates around 20K records per
second and report them as bulks with average size of 25. The other client
queries only one query per second. My document has the following format.

{
"_index": "my_index",
"_type": "my_type",
"_id": "7334236299916134105",
"_score": 3.6111107,
"_source": {
"long_1": 96186289301793,
"long_2": 7334236299916134000,
"string_1": "random_string",
"long_3": 96186289301793,
"string_2": "random_string",
"string_3": "random_string",
"string_4": "random_string",
"string_5": "random_string",
"long_4": 5457314198948537000
}
}

The problem is, after few minutes, Elasticsearch reports errors in the
logs like this.

[2015-02-24 08:03:58,070][ERROR][marvel.agent.exporter ] [Gateway]
create failure (index:[.marvel-2015.02.24] type: [cluster_stats]):
RemoteTransportException[[Marvel
Girl][inet[/10.167.199.140:9300]][bulk/shard]]; nested:
EsRejectedExecutionException[rejected execution (queue capacity 50) on
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1@76dbf01];

[2015-02-25 04:23:36,459][ERROR][marvel.agent.exporter ] [Wildside]
create failure (index:[.marvel-2015.02.25] type: [index_stats]):
UnavailableShardsException[[.marvel-2015.02.25][0] [2] shardIt, [0] active
: Timeout waiting for [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@2e7693b7]

Note that this error happens for different indices and different types.

Again after few minutes, Elasticsearch clients get
NoNodeAvailableException. I hope that is because Elasticsearch cluster
malfunctioning due to above errors. But eventually the clients get
"java.lang.OutOfMemoryError: GC overhead limit exceeded" error.

I did some profiling and found out that increasing
the org.elasticsearch.action.index.IndexRequest instances is the cause for
this OutOfMemory error. I tried even with "index.store.type: memory" and it
seems still the Elasticsearch cluster cannot build the indices to the
required rate.

Please point out any tuning parameters or any method to get rid of these
issues. Or please explain a different way to report and query this amount
of load.

Thanks
Malaka

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/35a29ca5-02f6-4fe9-8600-2cdb91c519cf%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/35a29ca5-02f6-4fe9-8600-2cdb91c519cf%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a78a83f3-090d-4265-8d8c-1fe0d669e70b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

What sort of queries are you running?

On 25 February 2015 at 22:08, Malaka Gallage mpgallage@gmail.com wrote:

Hi Mark,

Yes I'm using bulk API for the tests. Usually OOM error happens when the
cluster has around 30 million records. Is there anyway to tune the ES
cluster to perform better?

Thanks
Malaka

On Wednesday, February 25, 2015 at 12:25:52 PM UTC+5:30, Mark Walkom wrote:

If you are getting queue capacity rejections then you are over working
your cluster. Are you using the bulk API for your tests?
How much data is in your cluster when you get OOM?

On 25 February 2015 at 16:28, Malaka Gallage mpga...@gmail.com wrote:

Hi all,

I need some help here. I started a load test for Elasticsearch before
using that in production environment. I have three EC2 instances that are
configured in following manner which creates a Elasticsearch cluster.

All three machines has the following same hardware configurations.

32GB RAM
160GB SSD hard disk
8 core CPU

Machine 01
Elasticsearch server (16GB heap)
Elasticsearch Java client (Who generates a continues load and report to
ES - 4GB heap)

Machine 02
Elasticsearch server (16GB heap)
Elasticsearch Java client (Who generates a continues load and report to
ES - 4GB heap)

Machine 03
Elasticsearch server (16GB heap)
Elasticsearch Java client (Who queries from ES continuously - 1GB heap)

Note that the two clients together generates around 20K records per
second and report them as bulks with average size of 25. The other client
queries only one query per second. My document has the following format.

{
"_index": "my_index",
"_type": "my_type",
"_id": "7334236299916134105",
"_score": 3.6111107,
"_source": {
"long_1": 96186289301793,
"long_2": 7334236299916134000,
"string_1": "random_string",
"long_3": 96186289301793,
"string_2": "random_string",
"string_3": "random_string",
"string_4": "random_string",
"string_5": "random_string",
"long_4": 5457314198948537000
}
}

The problem is, after few minutes, Elasticsearch reports errors in the
logs like this.

[2015-02-24 08:03:58,070][ERROR][marvel.agent.exporter ] [Gateway]
create failure (index:[.marvel-2015.02.24] type: [cluster_stats]):
RemoteTransportException[[Marvel Girl][inet[/10.167.199.140:9300]][bulk/shard]];
nested: EsRejectedExecutionException[rejected execution (queue capacity
50) on org.elasticsearch.action.support.replication.
TransportShardReplicationOperationAction$AsyncShardOperationAction$1@
76dbf01];

[2015-02-25 04:23:36,459][ERROR][marvel.agent.exporter ] [Wildside]
create failure (index:[.marvel-2015.02.25] type: [index_stats]):
UnavailableShardsException[[.marvel-2015.02.25][0] [2] shardIt, [0]
active : Timeout waiting for [1m], request: org.elasticsearch.action.bulk.
BulkShardRequest@2e7693b7]

Note that this error happens for different indices and different types.

Again after few minutes, Elasticsearch clients get
NoNodeAvailableException. I hope that is because Elasticsearch cluster
malfunctioning due to above errors. But eventually the clients get
"java.lang.OutOfMemoryError: GC overhead limit exceeded" error.

I did some profiling and found out that increasing
the org.elasticsearch.action.index.IndexRequest instances is the cause
for this OutOfMemory error. I tried even with "index.store.type: memory"
and it seems still the Elasticsearch cluster cannot build the indices to
the required rate.

Please point out any tuning parameters or any method to get rid of these
issues. Or please explain a different way to report and query this amount
of load.

Thanks
Malaka

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/35a29ca5-02f6-4fe9-8600-2cdb91c519cf%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/35a29ca5-02f6-4fe9-8600-2cdb91c519cf%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a78a83f3-090d-4265-8d8c-1fe0d669e70b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a78a83f3-090d-4265-8d8c-1fe0d669e70b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9BsKyGa-CNEZA8VZG05vdLCts0pvSawr0LFiMWYz57TQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi Mark,

I run one random query out of four defined queries at a time.

  1. Getting the average of a field over some time period.
  2. Getting the max of a field over some time period.
  3. Getting the min of a field over some time period.
  4. Getting a percentile of a field over some time period.

Note that one of this query runs only once a second.

Thanks
Malaka

On Thursday, February 26, 2015 at 6:26:51 AM UTC+5:30, Mark Walkom wrote:

What sort of queries are you running?

On 25 February 2015 at 22:08, Malaka Gallage <mpga...@gmail.com
<javascript:>> wrote:

Hi Mark,

Yes I'm using bulk API for the tests. Usually OOM error happens when the
cluster has around 30 million records. Is there anyway to tune the ES
cluster to perform better?

Thanks
Malaka

On Wednesday, February 25, 2015 at 12:25:52 PM UTC+5:30, Mark Walkom
wrote:

If you are getting queue capacity rejections then you are over working
your cluster. Are you using the bulk API for your tests?
How much data is in your cluster when you get OOM?

On 25 February 2015 at 16:28, Malaka Gallage mpga...@gmail.com wrote:

Hi all,

I need some help here. I started a load test for Elasticsearch before
using that in production environment. I have three EC2 instances that are
configured in following manner which creates a Elasticsearch cluster.

All three machines has the following same hardware configurations.

32GB RAM
160GB SSD hard disk
8 core CPU

Machine 01
Elasticsearch server (16GB heap)
Elasticsearch Java client (Who generates a continues load and report to
ES - 4GB heap)

Machine 02
Elasticsearch server (16GB heap)
Elasticsearch Java client (Who generates a continues load and report to
ES - 4GB heap)

Machine 03
Elasticsearch server (16GB heap)
Elasticsearch Java client (Who queries from ES continuously - 1GB heap)

Note that the two clients together generates around 20K records per
second and report them as bulks with average size of 25. The other client
queries only one query per second. My document has the following format.

{
"_index": "my_index",
"_type": "my_type",
"_id": "7334236299916134105",
"_score": 3.6111107,
"_source": {
"long_1": 96186289301793,
"long_2": 7334236299916134000,
"string_1": "random_string",
"long_3": 96186289301793,
"string_2": "random_string",
"string_3": "random_string",
"string_4": "random_string",
"string_5": "random_string",
"long_4": 5457314198948537000
}
}

The problem is, after few minutes, Elasticsearch reports errors in the
logs like this.

[2015-02-24 08:03:58,070][ERROR][marvel.agent.exporter ] [Gateway]
create failure (index:[.marvel-2015.02.24] type: [cluster_stats]):
RemoteTransportException[[Marvel Girl][inet[/10.167.199.140:9300]][bulk/shard]];
nested: EsRejectedExecutionException[rejected execution (queue
capacity 50) on org.elasticsearch.action.support.replication.
TransportShardReplicationOperationAction$AsyncShardOperationAction$1@
76dbf01];

[2015-02-25 04:23:36,459][ERROR][marvel.agent.exporter ] [Wildside]
create failure (index:[.marvel-2015.02.25] type: [index_stats]):
UnavailableShardsException[[.marvel-2015.02.25][0] [2] shardIt, [0]
active : Timeout waiting for [1m], request: org.elasticsearch.action.bulk.
BulkShardRequest@2e7693b7]

Note that this error happens for different indices and different types.

Again after few minutes, Elasticsearch clients get
NoNodeAvailableException. I hope that is because Elasticsearch cluster
malfunctioning due to above errors. But eventually the clients get
"java.lang.OutOfMemoryError: GC overhead limit exceeded" error.

I did some profiling and found out that increasing
the org.elasticsearch.action.index.IndexRequest instances is the cause
for this OutOfMemory error. I tried even with "index.store.type: memory"
and it seems still the Elasticsearch cluster cannot build the indices to
the required rate.

Please point out any tuning parameters or any method to get rid of
these issues. Or please explain a different way to report and query this
amount of load.

Thanks
Malaka

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/35a29ca5-02f6-4fe9-8600-2cdb91c519cf%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/35a29ca5-02f6-4fe9-8600-2cdb91c519cf%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a78a83f3-090d-4265-8d8c-1fe0d669e70b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a78a83f3-090d-4265-8d8c-1fe0d669e70b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/48458521-a59b-4c16-9bbf-e4b21cb94d43%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.