Indexing 5GB file


What is the max size to index file? What are options to index a large file ~5GB other than breaking it into chuncks.


What contains this file?

This tab-delimited file with genomic information

i.e chromosome1 1000 834206 some_text some_value


You're fortunately not the first one to store genomic information in Elasticsearch! You may be interested in reaching out to others to hear what they've done if any of these sound familiar to your own problems:

As to how to specifically deal with this, I think the answer is going to depend on what your search goal(s) is/are. 5GB of text is just going to be too much to reasonably index/search in a single document, so I think some other strategy is going to be necessary.

Excellent! Thanks Shane for resources. We will get back to you and the community on our solution.

Just to confirm the max size of file that can be indexed is up to 20MB?

Lucene (which Elasticsearch uses) has a limit of 2GB, but Elasticsearch accepts requests over HTTP and has a default limit (http.max_content_length) of 100MB. You can increase that, but you'd probably want to read about it more here.

Thanks Shane, this is helpful!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.