Hello there,
Setup:
the elastic search version that we are using is 2.3.5
we set es java heap size to be 4GB
and then set bootstrap.mlockall: true
UseCase: We have a use case where we would like to index very large document like 1GB using mapper-attachment plugin.
We know that memory is going to be an issue so we are sending the 1 GB files into smaller chunks of 50MB into elastic search synchronously.
To append the chunks to elastic i have used append script as shown below:
var updateRequest = new UpdateRequest<DocumentPOCO, object>(indexName, document.GetType(), document.Id)
{
Script = "ctx._source.documentContent += appendContent",
Params = new Dictionary<string, object>
{
{ "appendContent", document.DocumentContent }
}
};
var updateRes = _elasticClient.Update(updateRequest);
We are now consistently getting java heaps memory exception when trying to index more than 400 MB using above script.
We are hoping for the script above to add chunk by chunk into the index rather than load all the previous chunks and then append the next chunk. Because of the "load previous chunks before append the next chunk" behavior has defeated the purpose of our sending document in small chunks to index.
My first question is, is there any more that we could do to the above script to achieve what we want?
And if not the second question would be, can anyone recommend another way of doing this? that is ElasticSearch can index the same document chunk by chunk?
Please assist.
Regards