Elasticsearch Max document length for indexing files


(Rahul Nama) #1

Hi Team

I'm indexing pdf files into Elasticsearch. I'm converting each file to text with python and i'm pushing it to Elasticsearch. content of one pdf file represents one document in Elasticsearch.

If my file is 10mb which means it has lot of content. Still will elasticsearch index the document? What's the max size of document in elasticsearch? Please suggest.


(David Pilato) #2

I think that the first limit you will hit is probably the http limit which is IIRC 100mb.


(Rahul Nama) #3

Thanks David @dadoonet

In addition, I've gone through FSCrawler settings file.

I think it's using one shard.

My ES is running on
OS: Red Hat Enterprise Linux Server release 7.5
RAM - 12 GB
CPU - 4
Heap space:
-Xms1g
-Xmx1g

how much data will be FSCrawler will be able to handle ? I want to index 10gb of files. Will it handle or do I have any restrictions ? Please suggest


(David Pilato) #4

No idea. I guess the main limit I can see you with 1gb heap is when you try to parse big files may be.

But I'll be happy to hear the feedback.