Large zip content

There I memory limit of the JVM, http network limit (100mb IIRC), ...

Indexing too big binary documents in elasticsearch is not a good idea IMHO.

You should do the extraction of metadata and text outside elasticsearch. FSCrawler project could help.