The main obstacle here is how documents are examined for indexing.
Lucene API accepts a whole document only for several reasons. That is, it has not only to index word by word, which would be the naive way, but also has to create statistics of terms, frequencies per document and per segment (per index).
To create these information bits, the document has to be copied in memory as a whole. So you observe that 500m of a document do not fit into the default 1g heap. Maybe this process can be chunked in the future, but it is very hard, it would require a stateful logic in Lucene API so that Lucene can hold and continue on some points in indexing.
Having this in mind, the workaround is to ramp up real large expensive machines with several hundreds of gigabytes and crank up the JVM configuration to address this heap. It will take minutes if not hours to process the gigabyte-sized documents you talk about and ES will behave like stalled before completing the task.
But the story continues. After indexing (if in memory at once or chunked), it is close to impossible to retrieve the document out of the index. The field source size will be incredibly high and the memory used for delivery will drain resources from other search and index tasks so they tend to hang easily. There is absolutely no sense in that. If you imagine "then I just do no source and just highlighting on the field" can be the solution, this will not work, since the highlighter works on the complete field in the index which does not save any memory at all. It will even slow down the delivery of hits.
Maybe you have already found answers to ES search on large documents, then I'd be happy to know.
So if you ask me, the use case of "index files no matter what" is not rational with regard to ES indexing and document delivery because it has not been thought to the end. And therefore it does not deserve a solution on ES side.
So why not, beside indexing chunks, just storing the document on a filesystem, index the file metadata, and point to them with an URL? Or preprocessing the document and extracting only the most significant words?