Large zip content


#1

Large zip file 2gb content read by tika was not indexing. Indexing is hanging at bulk request indexing. Smaller zip has no issues.

Ingest attachment is used for indexing.

Is there any limit in ES per document?


(David Pilato) #2

There I memory limit of the JVM, http network limit (100mb IIRC), ...

Indexing too big binary documents in elasticsearch is not a good idea IMHO.

You should do the extraction of metadata and text outside elasticsearch. FSCrawler project could help.


#3

Hi @dadoonet ,

Can I use fscrawler in my indexing java application?

Fetching content is not a problem but committing to elastic via bulk request is hanging for large zip files.

I have some more data of same record coming from dB. Content field comes from extracting zip file and reading each file with tika.

Please share ur thoughts.


#4

A separate 64gb server is allocated. Is there a way of delta commit to same document ? Say 100 Mb at a time and doing it 20 times


#5

The error message is

nested: EsRejectedExecutionException[rejected execution of org.elasticsearch.transport.TransportService$7@7b221dce on EsThreadPoolExecutor[bulk, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@49b77e41[Running, pool size = 12, active threads = 12, queued tasks = 5792, completed tasks = 455479]]];


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.