Unable to index a file (Word document) greater than 45 MB

We are trying to index a file (word document) which is > 45 MB . It is failing throwing an exception as below:
Maximum timeout reached while retrying request. Call: Status code unknown from: PUT /test_index6/_doc/16601?pipeline=Attachment

What is the maximum file size recommendation and how do we increase the size limit to handle large file indexing?

There are some limits like no more than 100mb for a single HTTP request.
45mb of binary data can become much bigger when you BASE64 encode it.

So be careful with that.

That's one of the reason I prefer to extract the text and metadata before sending it to elasticsearch. That's what FSCrawler project does basically.

Disclaimer: I'm the author of FSCrawler.

Thanks David.

As per this link from elastic :General recommendations | Elasticsearch Guide [7.12] | Elastic

we added http.max_content_length set to 350 mb in the yml file but no luck.

It says custom limit is 2 GB. Am I getting it wrong?

What are Elasticsearch logs and the full trace of the error?

Error Message : Exception of type 'System.OutOfMemoryException' was thrown.
Stack Trace:" at Elasticsearch.Net.Transport1.Request[TResponse](HttpMethod method, String path, PostData data, IRequestParameters requestParameters) in E:\Projects\Elastic\elasticsearch-net-copy\src\Elasticsearch.Net\Transport\Transport.cs:line 95 at Elasticsearch.Net.ElasticLowLevelClient.DoRequest[TResponse](HttpMethod method, String path, PostData data, IRequestParameters requestParameters) in E:\Projects\Elastic\elasticsearch-net-copy\src\Elasticsearch.Net\ElasticLowLevelClient.cs:line 70 at Nest.ElasticClient.DoRequest[TRequest,TResponse](TRequest p, IRequestParameters parameters, Action1 forceConfiguration) in E:\Projects\Elastic\elasticsearch-net-copy\src\Nest\ElasticClient.cs:line 133 at Nest.ElasticClient.Index[TDocument](IIndexRequest1 request) in E:\Projects\Elastic\elasticsearch-net-copy\src\Nest\ElasticClient.NoNamespace.cs:line 672 at Nest.ElasticClient.Index[TDocument](TDocument document, Func2 selector) in E:\Projects\Elastic\elasticsearch-net-copy\src\Nest\ElasticClient.NoNamespace.cs:line 658 "

Error Message : Maximum timeout reached while retrying request. Call: Status code unknown from: PUT /consult_index6/_doc/16601?pipeline=Attachment
No stack trace on elastic side for this.

We increased the http_max_content_length to 350 mb, but we still the issue with that. Also, when we check elasticsearch.log within the server, don't see much details.

Hope this would help.

Could it be a client side error?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.