Elasticsearch Max document length for indexing files

Hi Team

I'm indexing pdf files into Elasticsearch. I'm converting each file to text with python and i'm pushing it to Elasticsearch. content of one pdf file represents one document in Elasticsearch.

If my file is 10mb which means it has lot of content. Still will elasticsearch index the document? What's the max size of document in elasticsearch? Please suggest.

I think that the first limit you will hit is probably the http limit which is IIRC 100mb.

1 Like

Thanks David @dadoonet

In addition, I've gone through FSCrawler settings file.

I think it's using one shard.

My ES is running on
OS: Red Hat Enterprise Linux Server release 7.5
RAM - 12 GB
CPU - 4
Heap space:
-Xms1g
-Xmx1g

how much data will be FSCrawler will be able to handle ? I want to index 10gb of files. Will it handle or do I have any restrictions ? Please suggest

No idea. I guess the main limit I can see you with 1gb heap is when you try to parse big files may be.

But I'll be happy to hear the feedback.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.