Elasticsearch Load Test goes OutOfMemory

Malaka_Gallage · February 25, 2015, 5:28am

Hi all,

I need some help here. I started a load test for Elasticsearch before using
that in production environment. I have three EC2 instances that are
configured in following manner which creates a Elasticsearch cluster.

All three machines has the following same hardware configurations.

32GB RAM
160GB SSD hard disk
8 core CPU

Machine 01
Elasticsearch server (16GB heap)
Elasticsearch Java client (Who generates a continues load and report to ES

4GB heap)

Machine 02
Elasticsearch server (16GB heap)
Elasticsearch Java client (Who generates a continues load and report to
ES - 4GB heap)

Machine 03
Elasticsearch server (16GB heap)
Elasticsearch Java client (Who queries from ES continuously - 1GB heap)

Note that the two clients together generates around 20K records per second
and report them as bulks with average size of 25. The other client queries
only one query per second. My document has the following format.

{
"_index": "my_index",
"_type": "my_type",
"_id": "7334236299916134105",
"_score": 3.6111107,
"_source": {
"long_1": 96186289301793,
"long_2": 7334236299916134000,
"string_1": "random_string",
"long_3": 96186289301793,
"string_2": "random_string",
"string_3": "random_string",
"string_4": "random_string",
"string_5": "random_string",
"long_4": 5457314198948537000
}
}

The problem is, after few minutes, Elasticsearch reports errors in the logs
like this.

[2015-02-24 08:03:58,070][ERROR][marvel.agent.exporter ] [Gateway]
create failure (index:[.marvel-2015.02.24] type: [cluster_stats]):
RemoteTransportException[[Marvel
Girl][inet[/10.167.199.140:9300]][bulk/shard]]; nested:
EsRejectedExecutionException[rejected execution (queue capacity 50) on
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1@76dbf01];

[2015-02-25 04:23:36,459][ERROR][marvel.agent.exporter ] [Wildside]
create failure (index:[.marvel-2015.02.25] type: [index_stats]):
UnavailableShardsException[[.marvel-2015.02.25][0] [2] shardIt, [0] active
: Timeout waiting for [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@2e7693b7]

Note that this error happens for different indices and different types.

Again after few minutes, Elasticsearch clients get
NoNodeAvailableException. I hope that is because Elasticsearch cluster
malfunctioning due to above errors. But eventually the clients get
"java.lang.OutOfMemoryError: GC overhead limit exceeded" error.

I did some profiling and found out that increasing
the org.elasticsearch.action.index.IndexRequest instances is the cause for
this OutOfMemory error. I tried even with "index.store.type: memory" and it
seems still the Elasticsearch cluster cannot build the indices to the
required rate.

Please point out any tuning parameters or any method to get rid of these
issues. Or please explain a different way to report and query this amount
of load.

Thanks
Malaka

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/35a29ca5-02f6-4fe9-8600-2cdb91c519cf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · February 25, 2015, 6:55am

If you are getting queue capacity rejections then you are over working your
cluster. Are you using the bulk API for your tests?
How much data is in your cluster when you get OOM?

On 25 February 2015 at 16:28, Malaka Gallage mpgallage@gmail.com wrote:

Hi all,

I need some help here. I started a load test for Elasticsearch before
using that in production environment. I have three EC2 instances that are
configured in following manner which creates a Elasticsearch cluster.

All three machines has the following same hardware configurations.

32GB RAM
160GB SSD hard disk
8 core CPU

Machine 01
Elasticsearch server (16GB heap)
Elasticsearch Java client (Who generates a continues load and report to ES

4GB heap)

Machine 02
Elasticsearch server (16GB heap)
Elasticsearch Java client (Who generates a continues load and report to
ES - 4GB heap)

Machine 03
Elasticsearch server (16GB heap)
Elasticsearch Java client (Who queries from ES continuously - 1GB heap)

Note that the two clients together generates around 20K records per second
and report them as bulks with average size of 25. The other client queries
only one query per second. My document has the following format.

{
"_index": "my_index",
"_type": "my_type",
"_id": "7334236299916134105",
"_score": 3.6111107,
"_source": {
"long_1": 96186289301793,
"long_2": 7334236299916134000,
"string_1": "random_string",
"long_3": 96186289301793,
"string_2": "random_string",
"string_3": "random_string",
"string_4": "random_string",
"string_5": "random_string",
"long_4": 5457314198948537000
}
}

The problem is, after few minutes, Elasticsearch reports errors in the
logs like this.

[2015-02-24 08:03:58,070][ERROR][marvel.agent.exporter ] [Gateway]
create failure (index:[.marvel-2015.02.24] type: [cluster_stats]):
RemoteTransportException[[Marvel Girl][inet[/10.167.199.140:9300]][bulk/shard]];
nested: EsRejectedExecutionException[rejected execution (queue capacity 50)
on
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1@76dbf01
];

[2015-02-25 04:23:36,459][ERROR][marvel.agent.exporter ] [Wildside]
create failure (index:[.marvel-2015.02.25] type: [index_stats]):
UnavailableShardsException[[.marvel-2015.02.25][0] [2] shardIt, [0] active
: Timeout waiting for [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@2e7693b7]

Note that this error happens for different indices and different types.

Again after few minutes, Elasticsearch clients get
NoNodeAvailableException. I hope that is because Elasticsearch cluster
malfunctioning due to above errors. But eventually the clients get
"java.lang.OutOfMemoryError: GC overhead limit exceeded" error.

I did some profiling and found out that increasing
the org.elasticsearch.action.index.IndexRequest instances is the cause for
this OutOfMemory error. I tried even with "index.store.type: memory" and it
seems still the Elasticsearch cluster cannot build the indices to the
required rate.

Please point out any tuning parameters or any method to get rid of these
issues. Or please explain a different way to report and query this amount
of load.

Thanks
Malaka

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/35a29ca5-02f6-4fe9-8600-2cdb91c519cf%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/35a29ca5-02f6-4fe9-8600-2cdb91c519cf%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8xOtuTTo4h5WfN7Q2G87yxt3KoODHOXbJ1F6PfOd7CNg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Malaka_Gallage · February 25, 2015, 11:08am

Hi Mark,

Yes I'm using bulk API for the tests. Usually OOM error happens when the
cluster has around 30 million records. Is there anyway to tune the ES
cluster to perform better?

Thanks
Malaka

On Wednesday, February 25, 2015 at 12:25:52 PM UTC+5:30, Mark Walkom wrote:

If you are getting queue capacity rejections then you are over working
your cluster. Are you using the bulk API for your tests?
How much data is in your cluster when you get OOM?

On 25 February 2015 at 16:28, Malaka Gallage <mpga...@gmail.com
<javascript:>> wrote:

Hi all,

I need some help here. I started a load test for Elasticsearch before
using that in production environment. I have three EC2 instances that are
configured in following manner which creates a Elasticsearch cluster.

All three machines has the following same hardware configurations.

32GB RAM
160GB SSD hard disk
8 core CPU

Machine 01
Elasticsearch server (16GB heap)
Elasticsearch Java client (Who generates a continues load and report to
ES - 4GB heap)

Machine 02
Elasticsearch server (16GB heap)
Elasticsearch Java client (Who generates a continues load and report to
ES - 4GB heap)

Machine 03
Elasticsearch server (16GB heap)
Elasticsearch Java client (Who queries from ES continuously - 1GB heap)

Note that the two clients together generates around 20K records per
second and report them as bulks with average size of 25. The other client
queries only one query per second. My document has the following format.

{
"_index": "my_index",
"_type": "my_type",
"_id": "7334236299916134105",
"_score": 3.6111107,
"_source": {
"long_1": 96186289301793,
"long_2": 7334236299916134000,
"string_1": "random_string",
"long_3": 96186289301793,
"string_2": "random_string",
"string_3": "random_string",
"string_4": "random_string",
"string_5": "random_string",
"long_4": 5457314198948537000
}
}

The problem is, after few minutes, Elasticsearch reports errors in the
logs like this.

[2015-02-24 08:03:58,070][ERROR][marvel.agent.exporter ] [Gateway]
create failure (index:[.marvel-2015.02.24] type: [cluster_stats]):
RemoteTransportException[[Marvel
Girl][inet[/10.167.199.140:9300]][bulk/shard]]; nested:
EsRejectedExecutionException[rejected execution (queue capacity 50) on
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1@76dbf01];

[2015-02-25 04:23:36,459][ERROR][marvel.agent.exporter ] [Wildside]
create failure (index:[.marvel-2015.02.25] type: [index_stats]):
UnavailableShardsException[[.marvel-2015.02.25][0] [2] shardIt, [0] active
: Timeout waiting for [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@2e7693b7]

Note that this error happens for different indices and different types.

Again after few minutes, Elasticsearch clients get
NoNodeAvailableException. I hope that is because Elasticsearch cluster
malfunctioning due to above errors. But eventually the clients get
"java.lang.OutOfMemoryError: GC overhead limit exceeded" error.

I did some profiling and found out that increasing
the org.elasticsearch.action.index.IndexRequest instances is the cause for
this OutOfMemory error. I tried even with "index.store.type: memory" and it
seems still the Elasticsearch cluster cannot build the indices to the
required rate.

Please point out any tuning parameters or any method to get rid of these
issues. Or please explain a different way to report and query this amount
of load.

Thanks
Malaka

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/35a29ca5-02f6-4fe9-8600-2cdb91c519cf%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/35a29ca5-02f6-4fe9-8600-2cdb91c519cf%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a78a83f3-090d-4265-8d8c-1fe0d669e70b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · February 26, 2015, 12:56am

What sort of queries are you running?

On 25 February 2015 at 22:08, Malaka Gallage mpgallage@gmail.com wrote:

Hi Mark,

Yes I'm using bulk API for the tests. Usually OOM error happens when the
cluster has around 30 million records. Is there anyway to tune the ES
cluster to perform better?

Thanks
Malaka

On Wednesday, February 25, 2015 at 12:25:52 PM UTC+5:30, Mark Walkom wrote:

If you are getting queue capacity rejections then you are over working
your cluster. Are you using the bulk API for your tests?
How much data is in your cluster when you get OOM?

On 25 February 2015 at 16:28, Malaka Gallage mpga...@gmail.com wrote:

Hi all,

I need some help here. I started a load test for Elasticsearch before
using that in production environment. I have three EC2 instances that are
configured in following manner which creates a Elasticsearch cluster.

All three machines has the following same hardware configurations.

32GB RAM
160GB SSD hard disk
8 core CPU

Machine 01
Elasticsearch server (16GB heap)
Elasticsearch Java client (Who generates a continues load and report to
ES - 4GB heap)

Machine 02
Elasticsearch server (16GB heap)
Elasticsearch Java client (Who generates a continues load and report to
ES - 4GB heap)

Machine 03
Elasticsearch server (16GB heap)
Elasticsearch Java client (Who queries from ES continuously - 1GB heap)

Note that the two clients together generates around 20K records per
second and report them as bulks with average size of 25. The other client
queries only one query per second. My document has the following format.

{
"_index": "my_index",
"_type": "my_type",
"_id": "7334236299916134105",
"_score": 3.6111107,
"_source": {
"long_1": 96186289301793,
"long_2": 7334236299916134000,
"string_1": "random_string",
"long_3": 96186289301793,
"string_2": "random_string",
"string_3": "random_string",
"string_4": "random_string",
"string_5": "random_string",
"long_4": 5457314198948537000
}
}

The problem is, after few minutes, Elasticsearch reports errors in the
logs like this.

[2015-02-24 08:03:58,070][ERROR][marvel.agent.exporter ] [Gateway]
create failure (index:[.marvel-2015.02.24] type: [cluster_stats]):
RemoteTransportException[[Marvel Girl][inet[/10.167.199.140:9300]][bulk/shard]];
nested: EsRejectedExecutionException[rejected execution (queue capacity
50) on org.elasticsearch.action.support.replication.
TransportShardReplicationOperationAction$AsyncShardOperationAction$1@
76dbf01];

[2015-02-25 04:23:36,459][ERROR][marvel.agent.exporter ] [Wildside]
create failure (index:[.marvel-2015.02.25] type: [index_stats]):
UnavailableShardsException[[.marvel-2015.02.25][0] [2] shardIt, [0]
active : Timeout waiting for [1m], request: org.elasticsearch.action.bulk.
BulkShardRequest@2e7693b7]

Note that this error happens for different indices and different types.

Again after few minutes, Elasticsearch clients get
NoNodeAvailableException. I hope that is because Elasticsearch cluster
malfunctioning due to above errors. But eventually the clients get
"java.lang.OutOfMemoryError: GC overhead limit exceeded" error.

I did some profiling and found out that increasing
the org.elasticsearch.action.index.IndexRequest instances is the cause
for this OutOfMemory error. I tried even with "index.store.type: memory"
and it seems still the Elasticsearch cluster cannot build the indices to
the required rate.

Please point out any tuning parameters or any method to get rid of these
issues. Or please explain a different way to report and query this amount
of load.

Thanks
Malaka

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/35a29ca5-02f6-4fe9-8600-2cdb91c519cf%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/35a29ca5-02f6-4fe9-8600-2cdb91c519cf%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a78a83f3-090d-4265-8d8c-1fe0d669e70b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a78a83f3-090d-4265-8d8c-1fe0d669e70b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9BsKyGa-CNEZA8VZG05vdLCts0pvSawr0LFiMWYz57TQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Malaka_Gallage · February 26, 2015, 4:23am

Hi Mark,

I run one random query out of four defined queries at a time.

Getting the average of a field over some time period.
Getting the max of a field over some time period.
Getting the min of a field over some time period.
Getting a percentile of a field over some time period.

Note that one of this query runs only once a second.

Thanks
Malaka

On Thursday, February 26, 2015 at 6:26:51 AM UTC+5:30, Mark Walkom wrote:

What sort of queries are you running?

On 25 February 2015 at 22:08, Malaka Gallage <mpga...@gmail.com
<javascript:>> wrote:

Hi Mark,

Yes I'm using bulk API for the tests. Usually OOM error happens when the
cluster has around 30 million records. Is there anyway to tune the ES
cluster to perform better?

Thanks
Malaka

On Wednesday, February 25, 2015 at 12:25:52 PM UTC+5:30, Mark Walkom
wrote:

If you are getting queue capacity rejections then you are over working
your cluster. Are you using the bulk API for your tests?
How much data is in your cluster when you get OOM?

On 25 February 2015 at 16:28, Malaka Gallage mpga...@gmail.com wrote:

Hi all,

I need some help here. I started a load test for Elasticsearch before
using that in production environment. I have three EC2 instances that are
configured in following manner which creates a Elasticsearch cluster.

All three machines has the following same hardware configurations.

32GB RAM
160GB SSD hard disk
8 core CPU

Machine 01
Elasticsearch server (16GB heap)
Elasticsearch Java client (Who generates a continues load and report to
ES - 4GB heap)

Machine 02
Elasticsearch server (16GB heap)
Elasticsearch Java client (Who generates a continues load and report to
ES - 4GB heap)

Machine 03
Elasticsearch server (16GB heap)
Elasticsearch Java client (Who queries from ES continuously - 1GB heap)

Note that the two clients together generates around 20K records per
second and report them as bulks with average size of 25. The other client
queries only one query per second. My document has the following format.

{
"_index": "my_index",
"_type": "my_type",
"_id": "7334236299916134105",
"_score": 3.6111107,
"_source": {
"long_1": 96186289301793,
"long_2": 7334236299916134000,
"string_1": "random_string",
"long_3": 96186289301793,
"string_2": "random_string",
"string_3": "random_string",
"string_4": "random_string",
"string_5": "random_string",
"long_4": 5457314198948537000
}
}

The problem is, after few minutes, Elasticsearch reports errors in the
logs like this.

[2015-02-24 08:03:58,070][ERROR][marvel.agent.exporter ] [Gateway]
create failure (index:[.marvel-2015.02.24] type: [cluster_stats]):
RemoteTransportException[[Marvel Girl][inet[/10.167.199.140:9300]][bulk/shard]];
nested: EsRejectedExecutionException[rejected execution (queue
capacity 50) on org.elasticsearch.action.support.replication.
TransportShardReplicationOperationAction$AsyncShardOperationAction$1@
76dbf01];

[2015-02-25 04:23:36,459][ERROR][marvel.agent.exporter ] [Wildside]
create failure (index:[.marvel-2015.02.25] type: [index_stats]):
UnavailableShardsException[[.marvel-2015.02.25][0] [2] shardIt, [0]
active : Timeout waiting for [1m], request: org.elasticsearch.action.bulk.
BulkShardRequest@2e7693b7]

Note that this error happens for different indices and different types.

Again after few minutes, Elasticsearch clients get
NoNodeAvailableException. I hope that is because Elasticsearch cluster
malfunctioning due to above errors. But eventually the clients get
"java.lang.OutOfMemoryError: GC overhead limit exceeded" error.

I did some profiling and found out that increasing
the org.elasticsearch.action.index.IndexRequest instances is the cause
for this OutOfMemory error. I tried even with "index.store.type: memory"
and it seems still the Elasticsearch cluster cannot build the indices to
the required rate.

Please point out any tuning parameters or any method to get rid of
these issues. Or please explain a different way to report and query this
amount of load.

Thanks
Malaka

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/35a29ca5-02f6-4fe9-8600-2cdb91c519cf%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/35a29ca5-02f6-4fe9-8600-2cdb91c519cf%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a78a83f3-090d-4265-8d8c-1fe0d669e70b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a78a83f3-090d-4265-8d8c-1fe0d669e70b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/48458521-a59b-4c16-9bbf-e4b21cb94d43%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
ES load test ended up with out of memory error after enabling the clustering Elasticsearch	7	768	July 6, 2017
Help with OutOfMemory and some other memory concerns Elasticsearch	7	412	July 6, 2017
Garbage collection pauses causing cluster to get unresponsive Elasticsearch	20	1949	July 6, 2017
Elasticsearch on EC2 : load-average problem Elasticsearch	8	986	July 6, 2017
Java.lang.OutOfMemoryError: Java heap space Elasticsearch	25	3499	July 6, 2017

Elasticsearch Load Test goes OutOfMemory

Related topics