Reindex from 2.4 to 6.5, batch size is 200
http.max_content_length: "500m",
but still receive exception:
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Remote responded with a chunk that was too large. Use a smaller batch size."}],"type":"illegal_argument_exception","reason":"Remote responded with a chunk that was too large. Use a smaller batch size.","caused_by":{"type":"content_too_long_exception","reason":"entity content is too long [114048163] for the configured buffer limit [104857600]"}},"status":400}
I believe we intentionally did not make this configurable though I don't have much memory of what me-from-two-years-ago was thinking. The http.max_content_length setting is about the server's HTTP implementation while the reindex-from-remote setting isn't really the same thing.
That 100mb limit mostly exists to prevent taking up a ton of memory during the process. That 100mb is copied a few times by reindex-from-remote because it has to build index requests and shuffle them off to the appropriate place.
Generally we recommend using a smaller batch size or skipping the large documents explicitly. I'm aware this isn't super friendly though.
Got it.
IMHO ,this should become configurable. As when I am doing reindex, I mostly try to copy data to my new cluster. Our machines has 64G memory, so a default 100M is really too small.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.