ElasticSearch hangs when trying to upload large file

I have an environment set up running ElasticSearch 0.90.2 with two master
nodes (each on their own server) and 8 data nodes (two nodes on 4 servers):

  • master 1 - 4 vcpus, 32 GB RAM
  • master 2 - 8 vcpus, 16 GB RAM
  • data 1 - 4 each have 4 vcpus and 32 GB RAM

Initially, I was using a BulkRequestBuilder api to load json files to ES
but was having problems with the process hanging. So to isolate the
problem, I found the largest file (438MB) and attempted to upload via the
command line using curl. This also hung, so I tried the next largest file
that I had which was 183MB and curl was able to post this file without any
problems and only took about 12 minutes to complete.

The 453MB json file that I'm having trouble with contains binary data (the
183MB file also had binary data) and the curl command I am using:

curl -XPOST
'http://localhost:9200/attachment.index/Attachment/1234345345345' -d
@largebinaryfile.json > status.log

and after more than an hour, the progress meter shows:

% Total % Received % Xferd Average Speed Time
Curr.
Dload Upload Total Current Left
Speed
0 437M 0 0 0 0 0 0 --:--:-- --:--:-- --:--:--
0< HTTP/1.1 100 Continue } [data not shown]
100 437M 0 0 100 437M 0 83466 1:31:41 1:31:41 --:--:--
0

I've let this process run for more than 5 hours but it never completes and
never errors. Any ideas why ES is hanging?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ac2d0500-8540-48ce-b0a2-23628ab01b0e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Did you increase the max content length [1]? Also, I think you should be
using "--data-binary @largebinaryfiles.json" vs the "-d" option.

Thanks,
Matt Weber

[1]

On Wed, Dec 11, 2013 at 11:38 AM, mmwhita mmwhitaker1@gmail.com wrote:

I have an environment set up running Elasticsearch 0.90.2 with two master
nodes (each on their own server) and 8 data nodes (two nodes on 4 servers):

  • master 1 - 4 vcpus, 32 GB RAM
  • master 2 - 8 vcpus, 16 GB RAM
  • data 1 - 4 each have 4 vcpus and 32 GB RAM

Initially, I was using a BulkRequestBuilder api to load json files to ES
but was having problems with the process hanging. So to isolate the
problem, I found the largest file (438MB) and attempted to upload via the
command line using curl. This also hung, so I tried the next largest file
that I had which was 183MB and curl was able to post this file without any
problems and only took about 12 minutes to complete.

The 453MB json file that I'm having trouble with contains binary data (the
183MB file also had binary data) and the curl command I am using:

curl -XPOST '
http://localhost:9200/attachment.index/Attachment/1234345345345' -d
@largebinaryfile.json > status.log

and after more than an hour, the progress meter shows:

% Total % Received % Xferd Average Speed Time
Curr.
Dload Upload Total Current Left
Speed
0 437M 0 0 0 0 0 0 --:--:-- --:--:-- --:--:--
0< HTTP/1.1 100 Continue } [data not shown]
100 437M 0 0 100 437M 0 83466 1:31:41 1:31:41 --:--:--
0

I've let this process run for more than 5 hours but it never completes and
never errors. Any ideas why ES is hanging?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ac2d0500-8540-48ce-b0a2-23628ab01b0e%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJ3KEoCCWaS%2BELAzgmEsgydgJ8Swd6Fzxkm2J%2BrmvST_3OzzVg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Matt,

I did increase the http.max_content_length when I first tried to curl this
file because I got an error in the logs. I've also tried using the
--data-binary switch vice -d, but had the same results.

Thx!

On Wed, Dec 11, 2013 at 2:48 PM, Matt Weber matt.weber@gmail.com wrote:

Did you increase the max content length [1]? Also, I think you should be
using "--data-binary @largebinaryfiles.json" vs the "-d" option.

Thanks,
Matt Weber

[1]
Elasticsearch Platform — Find real-time answers at scale | Elastic

On Wed, Dec 11, 2013 at 11:38 AM, mmwhita mmwhitaker1@gmail.com wrote:

I have an environment set up running Elasticsearch 0.90.2 with two master
nodes (each on their own server) and 8 data nodes (two nodes on 4 servers):

  • master 1 - 4 vcpus, 32 GB RAM
  • master 2 - 8 vcpus, 16 GB RAM
  • data 1 - 4 each have 4 vcpus and 32 GB RAM

Initially, I was using a BulkRequestBuilder api to load json files to ES
but was having problems with the process hanging. So to isolate the
problem, I found the largest file (438MB) and attempted to upload via the
command line using curl. This also hung, so I tried the next largest file
that I had which was 183MB and curl was able to post this file without any
problems and only took about 12 minutes to complete.

The 453MB json file that I'm having trouble with contains binary data
(the 183MB file also had binary data) and the curl command I am using:

curl -XPOST '
http://localhost:9200/attachment.index/Attachment/1234345345345' -d
@largebinaryfile.json > status.log

and after more than an hour, the progress meter shows:

% Total % Received % Xferd Average Speed Time
Curr.
Dload Upload Total Current Left
Speed
0 437M 0 0 0 0 0 0 --:--:-- --:--:-- --:--:--
0< HTTP/1.1 100 Continue } [data not shown]
100 437M 0 0 100 437M 0 83466 1:31:41 1:31:41 --:--:--
0

I've let this process run for more than 5 hours but it never completes
and never errors. Any ideas why ES is hanging?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ac2d0500-8540-48ce-b0a2-23628ab01b0e%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/gidmFCKfUZ8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAJ3KEoCCWaS%2BELAzgmEsgydgJ8Swd6Fzxkm2J%2BrmvST_3OzzVg%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
Melody Whitaker

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMNNOSdbgJsFoJQ-kfC30zpzxb9Q_tS_T8QH657ecp9Xcz5ddg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

It looks you had tweaked the default settings, but I would upload data
portions of 100M at most, the whole length of the upload will require
valuable space on the heap.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFy8XOuEt1SUMZOm5yWrTMkhxo9N_r%2BxY-vQwaV6xeAew%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

I upgraded ES to 0.90.7 and it no longer hangs which is great but I get an
out of memory error after about 7-8 minutes of processing even though i've
allocated 14g max memory to all data nodes.

On Wed, Dec 11, 2013 at 2:56 PM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

It looks you had tweaked the default settings, but I would upload data
portions of 100M at most, the whole length of the upload will require
valuable space on the heap.

Jörg

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/gidmFCKfUZ8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFy8XOuEt1SUMZOm5yWrTMkhxo9N_r%2BxY-vQwaV6xeAew%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
Melody Whitaker

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMNNOSf6V%2B9ywj4T_U%3DvQAb57Pv7H1y9etO8sdtr_6bs_Es9sA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey,

do you have a stacktrace to look at? Also, your master nodes usually dont
need so much HEAP (it can be even counter productive, as this may lead to
longer garbage collections due to so much memory needing to be cleaned up).

--Alex

On Fri, Dec 13, 2013 at 7:01 PM, Melody Whitaker mmwhitaker1@gmail.comwrote:

I upgraded ES to 0.90.7 and it no longer hangs which is great but I get an
out of memory error after about 7-8 minutes of processing even though i've
allocated 14g max memory to all data nodes.

On Wed, Dec 11, 2013 at 2:56 PM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

It looks you had tweaked the default settings, but I would upload data
portions of 100M at most, the whole length of the upload will require
valuable space on the heap.

Jörg

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/gidmFCKfUZ8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFy8XOuEt1SUMZOm5yWrTMkhxo9N_r%2BxY-vQwaV6xeAew%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
Melody Whitaker

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAMNNOSf6V%2B9ywj4T_U%3DvQAb57Pv7H1y9etO8sdtr_6bs_Es9sA%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM89pvYVf2am56Q-Nzb-znHgkgEtozV-utxiYkkwyg2BTA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.