[indices:data/read/search[phase/fetch/id]]]; nested: ElasticsearchException[Java heap space]; nested: OutOfMemoryError[Java heap space]

SKumarMN · May 5, 2017, 8:55am

Elasticsearch Version 2.3.2
ES_HEAP_SIZE=2G

I have a usecase to perform indexing of attachments to ES. Due to certain use case, i cannot split the docs/attachments and index them as separate docs within ES. I was able to index 14 attachments (each attachment 130MB. however when i try to query i get the below issue. When i query, I am not requesting all the fields of the documents particularly i am not requesting for attachment field.

sample json doc

{
"name": "xyz",
"title", "xx",
attachment: "............"
}

[2017-05-04 03:42:22,869][DEBUG][action.search ] [Doop] [17] Failed to execute fetch phase
RemoteTransportException[[Doop][slc12oxp.us.x.com/10.196.3.67:9300][indices:data/read/search[phase/fetch/id]]]; nested: ElasticsearchException[Java heap space]; nested: OutOfMemoryError[Java heap space];
Caused by: ElasticsearchException[Java heap space]; nested: OutOfMemoryError[Java heap space];
at org.elasticsearch.ExceptionsHelper.convertToRuntime(ExceptionsHelper.java:50)
at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:604)
at org.elasticsearch.search.action.SearchServiceTransportAction$FetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:408)
at org.elasticsearch.search.action.SearchServiceTransportAction$FetchByIdTransportHandler.messageReceived(SearchServiceTransportAction.java:405)
at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33)
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:75)
at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Unknown Source)
at java.lang.String.(Unknown Source)
at java.lang.StringBuilder.toString(Unknown Source)
at com.fasterxml.jackson.core.util.TextBuffer.contentsAsString(TextBuffer.java:356)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishAndReturnString(UTF8StreamJsonParser.java:2412)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.getText(UTF8StreamJsonParser.java:285)
at org.elasticsearch.common.xcontent.json.JsonXContentParser.text(JsonXContentParser.java:84)
at org.elasticsearch.common.xcontent.support.AbstractXContentParser.readValue(AbstractXContentParser.java:299)
at org.elasticsearch.common.xcontent.support.AbstractXContentParser.readMap(AbstractXContentParser.java:274)
at org.elasticsearch.common.xcontent.support.AbstractXContentParser.readMap(AbstractXContentParser.java:245)
at org.elasticsearch.common.xcontent.support.AbstractXContentParser.map(AbstractXContentParser.java:208)
at org.elasticsearch.common.xcontent.XContentHelper.convertToMap(XContentHelper.java:83)
at org.elasticsearch.search.lookup.SourceLookup.sourceAsMapAndType(SourceLookup.java:88)
at org.elasticsearch.search.lookup.SourceLookup.loadSourceIfNeeded(SourceLookup.java:64)
at org.elasticsearch.search.lookup.SourceLookup.extractRawValues(SourceLookup.java:130)
at org.elasticsearch.search.fetch.FetchPhase.createSearchHit(FetchPhase.java:241)
at org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:178)
at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:592)
... 9 more

dadoonet · May 5, 2017, 9:47am

So you have a very big document containing some attachments encoded in BASE64? Everything is stored in the _source field I guess?

That might explain why it requires some memory to get the 1st 10 docs.

A suggestion is to remove non needed fields like attachment using the remove ingest feature.

BTW you did not tell how you are actually indexing your documents. Are you using ingest attachment?

Can you also show what your query looks like?

And finally, may be increase the HEAP size.

zhaochl · May 5, 2017, 10:12am

Your heap size is too low,
what is the total RAM size?
how many docs total ?

SKumarMN · May 6, 2017, 8:58am

Yes. I have to store it in _source document cause if I don't then during the update of the same document, I will not be able to retain it in the updated document.

A suggestion is to remove non needed fields like attachment using the remove ingest feature.

Are you talking about ingest Node?. I am using ES 2.3.2 is it available here too? No. Can you please elaborate it

BTW you did not tell how you are actually indexing your documents. Are you using ingest attachment?

I am using mapper_attachments as ingest_attachments isn't available in 2.3.2

Query

Is simple match query with attachment fields removed in the query

{
"fields" :["name", "title"],
"query": {
        "query_string" : {
                        "query" : "position"
        }
    }
}

SKumarMN · May 6, 2017, 8:59am

Ram is 8 GB. I have only few documents at the max 1000 of which 14 are very big each of 130 MB .

dadoonet · May 6, 2017, 9:49am

Ok. First be aware that mapper-attachments is removed in 6.0. So I'd suggest you to upgrade to 5.4.0 and use ingest instead.

Check if this helps in that case.

I'd also store the binary document outside elasticsearch and would only add the URL to the doc in elasticsearch alongside with the extracted text.

SKumarMN · May 6, 2017, 10:12am

I have read mapper-attachments would be removed starting 6.0. Upgrade to 5.4 is not smooth due to some business constraints.

Current design in our project makes heavy use of mapper_attachments for attachments processing. We don't want the original document in base 64 to be stored in ES, Initially we though if mapper_attachments supported copy_to, then we can avoid storing base64 in source, but mapper attachments doesn't support copy_to. The only reason we are storing the bas64 content in source is so that when updates happen to the same document, still i can retain the original extracted content in attachment.content field.

My question is it possible or is there a way to remove of base64 content, just have the extracted content in the source and thus make the document safe for updation too(i.e update the doc without losing any data).

dadoonet · May 6, 2017, 10:26am

You can exclude the BASE64 from the source with https://www.elastic.co/guide/en/elasticsearch/reference/2.4/mapping-source-field.html#include-exclude

Also have a look at FSCrawler project in cas it helps.

SKumarMN · May 7, 2017, 3:11pm

If I exclude it from source wht would happen when i update the document ?. I would loose it after the update right as for update contents needs to be in source field

dadoonet · May 7, 2017, 9:28pm

Probably. That's why I'd advice to upgrade to 5.4 and use ingest-attachment.

system · June 4, 2017, 9:36pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ES OutOfMemoryError while indexing a large number of attachments Elasticsearch	9	512	July 6, 2017
Out of space Exception Java Heap Elasticsearch	7	328	July 6, 2017
Elastic search memory heap exception when trying to index large document by chunks Elasticsearch	4	1178	December 13, 2016
Ingest-Attachment performance issue Elasticsearch	3	872	May 1, 2017
Elasticsearch does not respond with huge data Elasticsearch	2	453	July 6, 2017

[indices:data/read/search[phase/fetch/id]]]; nested: ElasticsearchException[Java heap space]; nested: OutOfMemoryError[Java heap space]

Related topics