I'm indexing documents with an array field with a big number of elements (may be 1000 - 10000 order of magnitude). Each element is a small json object. These documents have a big size: some of them are about 3MB.
I take care of not exceeding the payload limit of the bulk indexing, so I suspect the typology and size of the documents I'm indexing can be the root cause of the exceptions.
Is there any limit on the size of a single document that can be ingested? If not, is there any configuration parameter that I can tune to fix the problem?
I'm running a single node elasticsearch, with default configuration.
Thanks in advance.
You can raise the es_heap_size. 3mb is quite large for a document but
should be fine. It might be worth looking at the stats APIs to see if you
are spending heap on something like field data.
michele_crudele https://discuss.elastic.co/users/michele_crudele
June 19
I'm getting several exceptions like the one below in bulk indexing.
I'm indexing documents with an array field with a big number of elements
(may be 1000 - 10000 order of magnitude). Each element is a small json
object. These documents have a big size: some of them are about 3MB.
I take care of not exceeding the payload limit of the bulk indexing, so I
suspect the typology and size of the documents I'm indexing can be the root
cause of the exceptions.
Is there any limit on the size of a single document that can be ingested?
If not, is there any configuration parameter that I can tune to fix the
problem?
I'm running a single node elasticsearch, with default configuration.
Thanks in advance.
just checked my ES_HEAP_SIZE is set to 12G, so should be fine, right?
About stats, I'm a bit confused, there are a lots I can query. My I ask you a couple of example commands that you would run to get information about the specific problem of "documents too large"? Thanks again.
I'm fairly sure the problem isn't too large documents. Though, if you have a stacktrace from the logs for the OOM that'd help.
It'd be nice to see a graph of your heap usage over time. It should look like a saw tooth, rising slowly and dropping quickly. The time between each quick drop is the important thing, not the actual number. Marvel can do that with point and click. The nodes stats API is the place to get that information. You can also look in the indices stat for the field cache size.
OTOH if you somehow do push a multi-GB sized bulk I wouldn't be surprised to see an error like this.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.