Max doc size for indexing over HTTP


(ed perez) #1

I'm trying to index a document over 1gb in size but I get the following
error.

org.elasticsearch.common.netty.handler.codec.frame.TooLongFrameException:
HTTP content length exceeded 1073741824 bytes.
1073741824 = 1gb

My elasticsearch .yml has http.max_content_length set to 1900mb. I found
this post that says Netty has a 2gb limit (
https://github.com/elasticsearch/elasticsearch/issues/2237 ) so I was
expecting to be able to index documents of approximately 2gb. Does
elasticsearch impose a limit of 1gb even though I specified >1gb? If not,
I can file an issue on github.

This is my node info

{
"status" : 200,
"name" : "SOURCEONE-elastic-ubuntu-4",
"version" : {
"number" : "1.2.0",
"build_hash" : "c82387f290c21505f781c695f365d0ef4098b272",
"build_timestamp" : "2014-05-22T12:49:13Z",
"build_snapshot" : false,
"lucene_version" : "4.8"
},
"tagline" : "You Know, for Search"
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e32e66f9-0e96-49d0-b3d5-c9c35cd8d10d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #2

1gb is a very large document and it is unusual to index such sizes.

There is a limit check against the heap. In order to be able to process
such length, you need a large heap alone to store the document source.
Depending on analyzer, heap demand increases even more.

You can index documents of arbitrary length if you preprocess them first
and split them into smaller chunks.

Jörg

On Fri, Jun 6, 2014 at 2:49 PM, eperezks perez.ed@gmail.com wrote:

I'm trying to index a document over 1gb in size but I get the following
error.

org.elasticsearch.common.netty.handler.codec.frame.TooLongFrameException:
HTTP content length exceeded 1073741824 bytes.
1073741824 = 1gb

My elasticsearch .yml has http.max_content_length set to 1900mb. I found
this post that says Netty has a 2gb limit (
https://github.com/elasticsearch/elasticsearch/issues/2237 ) so I was
expecting to be able to index documents of approximately 2gb. Does
elasticsearch impose a limit of 1gb even though I specified >1gb? If not,
I can file an issue on github.

This is my node info

{
"status" : 200,
"name" : "SOURCEONE-elastic-ubuntu-4",
"version" : {
"number" : "1.2.0",
"build_hash" : "c82387f290c21505f781c695f365d0ef4098b272",
"build_timestamp" : "2014-05-22T12:49:13Z",
"build_snapshot" : false,
"lucene_version" : "4.8"
},
"tagline" : "You Know, for Search"
}

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e32e66f9-0e96-49d0-b3d5-c9c35cd8d10d%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/e32e66f9-0e96-49d0-b3d5-c9c35cd8d10d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoENg2gujZ01Gk-P6uDHBh5QffdQLB7fEgRJUUqdqeO4oQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3