Large zip content

Chelambarasan_CR · December 25, 2017, 11:03am

Large zip file 2gb content read by tika was not indexing. Indexing is hanging at bulk request indexing. Smaller zip has no issues.

Ingest attachment is used for indexing.

Is there any limit in ES per document?

dadoonet · December 25, 2017, 1:28pm

There I memory limit of the JVM, http network limit (100mb IIRC), ...

Indexing too big binary documents in elasticsearch is not a good idea IMHO.

You should do the extraction of metadata and text outside elasticsearch. FSCrawler project could help.

Chelambarasan_CR · December 26, 2017, 3:03am

Hi @dadoonet ,

Can I use fscrawler in my indexing java application?

Fetching content is not a problem but committing to elastic via bulk request is hanging for large zip files.

I have some more data of same record coming from dB. Content field comes from extracting zip file and reading each file with tika.

Please share ur thoughts.

Chelambarasan_CR · December 26, 2017, 3:07am

A separate 64gb server is allocated. Is there a way of delta commit to same document ? Say 100 Mb at a time and doing it 20 times

Chelambarasan_CR · December 26, 2017, 1:17pm

The error message is

nested: EsRejectedExecutionException[rejected execution of org.elasticsearch.transport.TransportService$7@7b221dce on EsThreadPoolExecutor[bulk, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@49b77e41[Running, pool size = 12, active threads = 12, queued tasks = 5792, completed tasks = 455479]]];

system · January 23, 2018, 1:17pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to index text file having size more than the system memory Elasticsearch	8	2102	July 6, 2017
Request Entity Too Large when index file json has size large 100mb Elasticsearch	5	1840	November 6, 2019
ElasticSearch 2.2.0 - File Too Large while bulk indexing Elasticsearch	3	1466	July 5, 2017
[Java] Stream large file while indexing Elasticsearch	10	2260	July 6, 2017
Elasticsearch considerations for ingesting large files Elasticsearch	7	2549	May 9, 2020

Large zip content

Related topics