Java.lang.OutOfMemoryError - how to anticipate memory usage?

Problem:
Indexing tens of millions of small (200B) docs. During indexing
process memory is gradually rising, see:

As soon as ES reaches max memory limit I receive WARN:
failed engine java.lang.OutOfMemoryError: Java heap space

Since that moment, node is dead.

Questions:
1)Is there any way to anticipate how much memory is needed if I want
to store 5,000,000,000 docs?
2)Why memory is gradually rising, is the memory usage rising together
with number of docs?

My config:
Using 1 index, default min/max memory limits.

cluster.name: mwegorekTest

gateway.type: fs
gateway.fs.location: /mnt/mwegorek
gateway.fs.snapshot_interval: 30s
gateway.recover_after_nodes: 2
gateway.recover_after_time: 1m
gateway.expected_nodes: 4

discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.timeout: 3s
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["test-indexer-1.atm:9300", "test-
indexer-2.atm:9300", "test-indexer-3.atm:9300", "test-indexer-4.atm:
9300"]

threadpool:
index:
type: blocking
min: 4
size: 10
wait_time: 60s

On Fri, Nov 18, 2011 at 3:57 PM, Michal Wegorek wegorekm@gmail.com wrote:

Problem:
Indexing tens of millions of small (200B) docs. During indexing
process memory is gradually rising, see:
http://i.imgur.com/xfI98.png

As soon as ES reaches max memory limit I receive WARN:
failed engine java.lang.OutOfMemoryError: Java heap space

Since that moment, node is dead.

Questions:
1)Is there any way to anticipate how much memory is needed if I want
to store 5,000,000,000 docs?

Hard to tell without understanding the structure of the docs, what get
indexed, and so on. You can extrapolate on the memory that will be needed
by indexing a subset of the docs.

2)Why memory is gradually rising, is the memory usage rising together
with number of docs?

Yes, the more docs you have the more memory will be needed, but its not
linear. Also, memory usage relates to if faceting are used, sorting, number
of fields indexed, how the text get analyzed.

My config:
Using 1 index, default min/max memory limits.

cluster.name: mwegorekTest

gateway.type: fs
gateway.fs.location: /mnt/mwegorek
gateway.fs.snapshot_interval: 30s
gateway.recover_after_nodes: 2
gateway.recover_after_time: 1m
gateway.expected_nodes: 4

discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.timeout: 3s
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["test-indexer-1.atm:9300", "test-
indexer-2.atm:9300", "test-indexer-3.atm:9300", "test-indexer-4.atm:
9300"]

threadpool:
index:
type: blocking
min: 4
size: 10
wait_time: 60s

Shay, thank you for a reply. Currently I'm preparing for
extrapolation.

This is the record which is indexed(Python format):

data=r'''{"file" : {
"id" : "000000000%d",
"name" : "abc%d",
"date" : "12.12.%d"
}}'''

%d is starting as 0, then incremented

The data is indexed with simple pycurl HTTP PUT, no additional
settings, all fields are indexed.

At around 1e+8 records 1024MB memory is not enough. Elasticsearch is
raising java.lang.OutOfMemoryError.

Q1) Is elasticsearch caching index requests which cannot be served
immediately by hard drive? Or is it done with a blocking way?

Q2) Is this the correct setting to make index behave in a blocking
way?

threadpool:
index:
type: blocking

Q3) Memory is rasing together with number of records, you said its not
linear, is it logarithmic?

ES version: 0.18.2

Best,
Michal.

On 20 Lis, 09:31, Shay Banon kim...@gmail.com wrote:

On Fri, Nov 18, 2011 at 3:57 PM, Michal Wegorek wegor...@gmail.com wrote:

Problem:
Indexing tens of millions of small (200B) docs. During indexing
process memory is gradually rising, see:
http://i.imgur.com/xfI98.png

As soon as ES reaches max memory limit I receive WARN:
failed engine java.lang.OutOfMemoryError: Java heap space

Since that moment, node is dead.

Questions:
1)Is there any way to anticipate how much memory is needed if I want
to store 5,000,000,000 docs?

Hard to tell without understanding the structure of the docs, what get
indexed, and so on. You can extrapolate on the memory that will be needed
by indexing a subset of the docs.

2)Why memory is gradually rising, is the memory usage rising together
with number of docs?

Yes, the more docs you have the more memory will be needed, but its not
linear. Also, memory usage relates to if faceting are used, sorting, number
of fields indexed, how the text get analyzed.

My config:
Using 1 index, default min/max memory limits.

cluster.name: mwegorekTest

gateway.type: fs
gateway.fs.location: /mnt/mwegorek
gateway.fs.snapshot_interval: 30s
gateway.recover_after_nodes: 2
gateway.recover_after_time: 1m
gateway.expected_nodes: 4

discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.timeout: 3s
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["test-indexer-1.atm:9300", "test-
indexer-2.atm:9300", "test-indexer-3.atm:9300", "test-indexer-4.atm:
9300"]

threadpool:
index:
type: blocking
min: 4
size: 10
wait_time: 60s

Hello
I use Java API to index 5 following documents into an index called javabulkindex and a mapping called javabulkmappingtype, using Java Bulk API with ES 0.18.4 and O.17.6:

BulkRequestBuilder BRB=client.prepareBulk();

BRB.add(client.prepareIndex().setIndex("javabulkindex").setType("javabulkmappingtype").setId("ID1").setSource("field", JSONObject.fromObject(JsonString1)));

BRB.add(client.prepareIndex().setIndex("javabulkindex").setType("javabulkmappingtype").setId("ID2").setSource("field", JSONObject.fromObject(JsonString2)));

BRB.add(client.prepareIndex().setIndex("javabulkindex").setType("javabulkmappingtype").setId("ID3").setSource("field", JSONObject.fromObject(JsonString2)));

BRB.add(client.prepareIndex().setIndex("javabulkindex").setType("javabulkmappingtype").setId("ID4").setSource("field", "value1"));

BRB.add(client.prepareIndex().setIndex("javabulkindex").setType("javabulkmappingtype").setId("ID5").setSource("field", "value2"));

BulkResponse br = BRB.execute().actionGet();

if (br.hasFailures()) {
System.out.println("\nbulk failure with error: " + br.buildFailureMessage());

}

T

  1. if this index and mapping ( javabulkindex and javabulkmappingtype) have been created before (with number_of_shards=7, numbers_replicas=2), then all above commands have fails with error message "cannot allocate shards..."

  2. if I delete this index and mapping and run these commands again, then:

  • an index and mapping with these names are created
    -all first three commands are successful, but the two last commands fail with error message "try to parse string value as object but it seems that they are provided a value" and " a problem with EOF...". Attention: these last two commands I used String instead of an object for field's value.
    -if I delete first three commands, then these last two commands are successful.

So, please explain to me the reason and how to solve it.

Thanks a lot
Tran CD


Ngu ngoc lon nhat cua doi nguoi la yeu duong - dieu ran thu 15 cua duc Phat (tai sao khong)

TINH YEU KHONG TU NHIEN SINH RA, KHONG TU NHIEN MAT DI, NO CHI CHUYEN TU NGUOI NAY SANG NGUOI KHAC

     NGAY MAI EM DI

BIEN NAM CHAN EM GIAT VE


De : Shay Banon kimchy@gmail.com
À : elasticsearch@googlegroups.com
Envoyé le : Dimanche 20 Novembre 2011 9h31
Objet : Re: java.lang.OutOfMemoryError - how to anticipate memory usage?

On Fri, Nov 18, 2011 at 3:57 PM, Michal Wegorek wegorekm@gmail.com wrote:

Problem:

Indexing tens of millions of small (200B) docs. During indexing
process memory is gradually rising, see:
http://i.imgur.com/xfI98.png

As soon as ES reaches max memory limit I receive WARN:
failed engine java.lang.OutOfMemoryError: Java heap space

Since that moment, node is dead.

Questions:
1)Is there any way to anticipate how much memory is needed if I want
to store 5,000,000,000 docs?

Hard to tell without understanding the structure of the docs, what get indexed, and so on. You can extrapolate on the memory that will be needed by indexing a subset of the docs.

2)Why memory is gradually rising, is the memory usage rising together

with number of docs?

Yes, the more docs you have the more memory will be needed, but its not linear. Also, memory usage relates to if faceting are used, sorting, number of fields indexed, how the text get analyzed.

My config:
Using 1 index, default min/max memory limits.

cluster.name: mwegorekTest

gateway.type: fs
gateway.fs.location: /mnt/mwegorek
gateway.fs.snapshot_interval: 30s
gateway.recover_after_nodes: 2
gateway.recover_after_time: 1m
gateway.expected_nodes: 4

discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.timeout: 3s
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["test-indexer-1.atm:9300", "test-
indexer-2.atm:9300", "test-indexer-3.atm:9300", "test-indexer-4.atm:
9300"]

threadpool:
index:
type: blocking
min: 4
size: 10
wait_time: 60s

Hello
I re-send this question because I receive an error from my preceding sending, sorry to bother you.

I use Java API to index 5 following documents into an index called javabulkindex and a mapping called javabulkmappingtype, using Java Bulk API with ES 0.18.4 and O.17.6:

BulkRequestBuilder BRB=client.prepareBulk();

BRB.add(client.prepareIndex().setIndex("javabulkindex").setType("javabulkmappingtype").setId("ID1").setSource("field", JSONObject.fromObject(JsonString1)));

BRB.add(client.prepareIndex().setIndex("javabulkindex").setType("javabulkmappingtype").setId("ID2").setSource("field", JSONObject.fromObject(JsonString2)));

BRB.add(client.prepareIndex().setIndex("javabulkindex").setType("javabulkmappingtype").setId("ID3").setSource("field", JSONObject.fromObject(JsonString2)));

BRB.add(client.prepareIndex().setIndex("javabulkindex").setType("javabulkmappingtype").setId("ID4").setSource("field", "value1"));

BRB.add(client.prepareIndex().setIndex("javabulkindex").setType("javabulkmappingtype").setId("ID5").setSource("field", "value2"));

BulkResponse br = BRB.execute().actionGet();

if (br.hasFailures()) {
System.out.println("\nbulk failure with error: " + br.buildFailureMessage());

}

T

  1. if this index and mapping ( javabulkindex and javabulkmappingtype) have been created before (with number_of_shards=7, numbers_replicas=2), then all above commands have fails with error message "cannot allocate shards..."

  2. if I delete this index and mapping and run these commands again, then:

  • an index and mapping with these names are created
    -all first three commands are successful, but the two last commands fail with error message "try to parse string value as object but it seems that they are provided a value" and " a problem with EOF...". Attention: these last two commands I used String instead of an object for field's value.
    -if I delete first three commands, then these last two commands are successful.

So, please explain to me the reason and how to solve it. I really want to bulk documents into an available index and mapping that have been created before (problem 1) and to bulk successfully all these five commands (problem 2)

Thanks a lot
Tran CD

How many nodes do you start? If you create an index with 2 replicas, and
have just one data node you index into, it will not index into it, because
not enough replicas are allocated (its expects a quorum to index).

Regarding your other problem, what is "field"? Is it a String, or a json
object? It can't a String in some documents, and a json object / Map in
other documents.

On Tue, Nov 22, 2011 at 12:21 AM, Chi Dung Tran dungtctin4@yahoo.comwrote:

Hello
I re-send this question because I receive an error from my preceding
sending, sorry to bother you.

I use Java API to index 5 following documents into an index called javabulkindex
and a mapping called javabulkmappingtype, using Java Bulk API with ES
0.18.4 and O.17.6:

BulkRequestBuilder BRB=client.prepareBulk();

BRB.add(client.prepareIndex().setIndex("javabulkindex").setType("javabulkmappingtype").setId("ID1").setSource("field",
JSONObject.fromObject(JsonString1)));

BRB.add(client.prepareIndex().setIndex("javabulkindex").setType("javabulkmappingtype").setId("ID2").setSource("field",
JSONObject.fromObject(JsonString2)));

BRB.add(client.prepareIndex().setIndex("javabulkindex").setType("javabulkmappingtype").setId("ID3").setSource("field",
JSONObject.fromObject(JsonString2)));

BRB.add(client.prepareIndex().setIndex("javabulkindex").setType("javabulkmappingtype").setId("ID4").setSource("field",
"value1"));

BRB.add(client.prepareIndex().setIndex("javabulkindex").setType("javabulkmappingtype").setId("ID5").setSource("field",
"value2"));

BulkResponse br = BRB.execute().actionGet();

if (br.hasFailures()) {
System.out.println("\nbulk failure with error: " +
br.buildFailureMessage());

}

T

  1. if this index and mapping ( javabulkindex and javabulkmappingtype)
    have been created before (with number_of_shards=7, numbers_replicas=2),
    then all above commands have fails with error message "cannot allocate
    shards..."

  2. if I delete this index and mapping and run these commands again, then:

  • an index and mapping with these names are created
    -all first three commands are successful, but the two last commands fail
    with error message "try to parse string value as object but it seems that
    they are provided a value" and " a problem with EOF...". Attention: these
    last two commands I used String instead of an object for field's value.
    -if I delete first three commands, then these last two commands are
    successful.

So, please explain to me the reason and how to solve it. I really want to
bulk documents into an available index and mapping that have been created
before (problem 1) and to bulk successfully all these five commands
(problem 2)

Thanks a lot
Tran CD

Thanks for your response. All is tested on only one machine in my development box. There are two successfully createdindices : the first has has 9 shards, 2 replicas. The second has 7 shards, 3 replicas. If I index each document into one of two indices, it is always successful. If I use bulk, sometimes it fails with this error message "Failed to create shard, message [IndexShardCreationException... ,or :"UnavailableShardsException..." but sometimes it is successful when I restart the machine. In general, the failure only appears if I use bulks.
So I cann't explain why.


Ngu ngoc lon nhat cua doi nguoi la yeu duong - dieu ran thu 15 cua duc Phat (tai sao khong)

TINH YEU KHONG TU NHIEN SINH RA, KHONG TU NHIEN MAT DI, NO CHI CHUYEN TU NGUOI NAY SANG NGUOI KHAC

     NGAY MAI EM DI

BIEN NAM CHAN EM GIAT VE


De : Shay Banon kimchy@gmail.com
À : elasticsearch@googlegroups.com
Envoyé le : Mardi 22 Novembre 2011 13h50
Objet : Re: JAVA Bulk API problems and failures

How many nodes do you start? If you create an index with 2 replicas, and have just one data node you index into, it will not index into it, because not enough replicas are allocated (its expects a quorum to index).

Regarding your other problem, what is "field"? Is it a String, or a json object? It can't a String in some documents, and a json object / Map in other documents.

On Tue, Nov 22, 2011 at 12:21 AM, Chi Dung Tran dungtctin4@yahoo.com wrote:

Hello

I re-send this question because I receive an error from my preceding sending, sorry to bother you.

I use Java API to index 5 following documents into an index called javabulkindex and a mapping called javabulkmappingtype, using Java Bulk API with ES 0.18.4 and O.17.6:

BulkRequestBuilder BRB=client.prepareBulk();

BRB.add(client.prepareIndex().setIndex("javabulkindex").setType("javabulkmappingtype").setId("ID1").setSource("field", JSONObject.fromObject(JsonString1)));

BRB.add(client.prepareIndex().setIndex("javabulkindex").setType("javabulkmappingtype").setId("ID2").setSource("field",
JSONObject.fromObject(JsonString2)));

BRB.add(client.prepareIndex().setIndex("javabulkindex").setType("javabulkmappingtype").setId("ID3").setSource("field", JSONObject.fromObject(JsonString2)));

BRB.add(client.prepareIndex().setIndex("javabulkindex").setType("javabulkmappingtype").setId("ID4").setSource("field", "value1"));

BRB.add(client.prepareIndex().setIndex("javabulkindex").setType("javabulkmappingtype").setId("ID5").setSource("field", "value2"));

BulkResponse br = BRB.execute().actionGet();

if (br.hasFailures()) {T
System.out.println("\nbulk failure with error: " + br.buildFailureMessage());

}

T

  1. if this index and mapping ( javabulkindex and javabulkmappingtype) have been created before (with number_of_shards=7, numbers_replicas=2), then all above commands have fails with error message "cannot allocate shards..."

  2. if I delete this index and mapping and run these commands again, then:

  • an index and mapping with these names are created
    -all first three commands are successful, but the two last commands fail with error message "try to parse string value as object but it seems that they are provided a value" and " a problem with EOF...". Attention: these last two commands I used String instead of an object for field's value.
    -if I delete first three commands, then these last two commands are successful.

So, please explain to me the reason and how to solve it. I really want to bulk documents into an available index and mapping that have been created before (problem 1) and to bulk successfully all these five commands (problem 2)

Thanks a lot
Tran CD

If you start just one node, and create an index with 2 replicas, any
indexing operation will fail because there isn't a quorum of shard replicas
allocated (just 1, because you have 1 node). This is explained in the write
consistency part in teh index operation:
Elasticsearch Platform — Find real-time answers at scale | Elastic.

Here is an example:

start one node, and create an index with 2 replicas

curl -XPUT localhost:9200/test1 -d '{
"settings" : {
"index.number_of_replicas" : 2
}
}'

try and index a document, and see it fails (lower timeout to get fast

failure)
curl -XPUT localhost:9200/test1/type1/1?timeout=10s -d '{}'

On Wed, Nov 23, 2011 at 7:31 PM, Chi Dung Tran dungtctin4@yahoo.com wrote:

Thanks for your response. All is tested on only one machine in my
development box. There are two successfully created indices : the first
has has 9 shards, 2 replicas. The second has 7 shards, 3 replicas. If I
index each document into one of two indices, it is always successful. If I
use bulk, sometimes it fails with this error message "Failed to create
shard, message [IndexShardCreationException... ,or :"
UnavailableShardsException..." but sometimes it is successful when I
restart the machine. In general, the failure only appears if I use bulks.
So I cann't explain why.


Ngu ngoc lon nhat cua doi nguoi la yeu duong - dieu ran thu 15 cua duc
Phat (tai sao khong)

TINH YEU KHONG TU NHIEN SINH RA, KHONG TU NHIEN MAT DI, NO CHI CHUYEN TU
NGUOI NAY SANG NGUOI KHAC

  •     NGAY MAI EM DI*
    

BIEN NAM CHAN EM GIAT VE
**


De : Shay Banon kimchy@gmail.com
À : elasticsearch@googlegroups.com
Envoyé le : Mardi 22 Novembre 2011 13h50
Objet : Re: JAVA Bulk API problems and failures

How many nodes do you start? If you create an index with 2 replicas, and
have just one data node you index into, it will not index into it, because
not enough replicas are allocated (its expects a quorum to index).

Regarding your other problem, what is "field"? Is it a String, or a json
object? It can't a String in some documents, and a json object / Map in
other documents.

On Tue, Nov 22, 2011 at 12:21 AM, Chi Dung Tran dungtctin4@yahoo.comwrote:

Hello
I re-send this question because I receive an error from my preceding
sending, sorry to bother you.

I use Java API to index 5 following documents into an index called javabulkindex
and a mapping called javabulkmappingtype, using Java Bulk API with ES
0.18.4 and O.17.6:

BulkRequestBuilder BRB=client.prepareBulk();

BRB.add(client.prepareIndex().setIndex("javabulkindex").setType("javabulkmappingtype").setId("ID1").setSource("field",
JSONObject.fromObject(JsonString1)));

BRB.add(client.prepareIndex().setIndex("javabulkindex").setType("javabulkmappingtype").setId("ID2").setSource("field",
JSONObject.fromObject(JsonString2)));

BRB.add(client.prepareIndex().setIndex("javabulkindex").setType("javabulkmappingtype").setId("ID3").setSource("field",
JSONObject.fromObject(JsonString2)));

BRB.add(client.prepareIndex().setIndex("javabulkindex").setType("javabulkmappingtype").setId("ID4").setSource("field",
"value1"));

BRB.add(client.prepareIndex().setIndex("javabulkindex").setType("javabulkmappingtype").setId("ID5").setSource("field",
"value2"));

BulkResponse br = BRB.execute().actionGet();

if (br.hasFailures()) {T

       System.out.println("\nbulk failure with error: " +

br.buildFailureMessage());

}

T

  1. if this index and mapping ( javabulkindex and javabulkmappingtype)
    have been created before (with number_of_shards=7, numbers_replicas=2),
    then all above commands have fails with error message "cannot allocate
    shards..."

  2. if I delete this index and mapping and run these commands again, then:

  • an index and mapping with these names are created
    -all first three commands are successful, but the two last commands fail
    with error message "try to parse string value as object but it seems that
    they are provided a value" and " a problem with EOF...". Attention: these
    last two commands I used String instead of an object for field's value.
    -if I delete first three commands, then these last two commands are
    successful.

So, please explain to me the reason and how to solve it. I really want to
bulk documents into an available index and mapping that have been created
before (problem 1) and to bulk successfully all these five commands
(problem 2)

Thanks a lot
Tran CD