Questions around indexing very large documents have been asked before on the forum. Please have a look at:
Hi ES Team,
This question may have been asked earlier. I am looking for ways to index really large documents like PDF, Word using ES. Our application crawls enterprise Active Directory system and we have hit some really large documents that we are unable to index as of now. We are seeing OutOfMemory Error in ES logs. I must also mention that for large files we pass a URL to a custom plugin running within ES server. This plugin reads the entire source file into memory from the provided URL befor…
Hello there,
Setup:
the elastic search version that we are using is 2.3.5
we set es java heap size to be 4GB
and then set bootstrap.mlockall: true
UseCase: We have a use case where we would like to index very large document like 1GB using mapper-attachment plugin.
We know that memory is going to be an issue so we are sending the 1 GB files into smaller chunks of 50MB into elastic search synchronously.
To append the chunks to elastic i have used append script as shown below:
var updateRe…