Recently, curl for ES cluster sometimes fails with "curl: (56) Failure when receiving data from the peer"
For example today a curl job run by Jenkins failed like below.
( removing alias succeed but adding alias failed)
curl -X DELETE 'http://10.97.54.65:9200/jpl_denorm_band_*/_alias/head_jpl_denorm_band'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
0 21 0 21 0 0 112 0 --:--:-- --:--:-- --:--:-- 112
{"acknowledged":true}
curl -X PUT http://10.97.54.65:9200/jpl_denorm_band_20180311/_alias/head_jpl_denorm_band
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
curl: (56) Failure when receiving data from the peer
Build step 'Execute shell' marked build as failure
This is elasticsearch logs around that time
[2018-03-12T06:05:26,747][WARN ][o.e.t.n.Netty4Transport ] [xelastic101.band] exception caught on transport layer [[id: 0x1969e9e6, L:/10.97.54.65:9300 - R:/10.99.39.9:38746]], closing connection
java.lang.ClassCastException: null
[2018-03-12T06:11:55,684][WARN ][o.e.h.n.Netty4HttpServerTransport] [xelastic101.band] caught exception while handling client http traffic, closing connection [id: 0xfaf4fe37, L:/10.97.54.65:9200 - R:/10.99.39.9:51022]
java.lang.ClassCastException: null
[2018-03-12T06:38:09,485][WARN ][o.e.t.n.Netty4Transport ] [xelastic101.band] exception caught on transport layer [[id: 0x00995d3a, L:/10.97.54.65:9300 - R:/10.99.39.9:39034]], closing connection
java.lang.ClassCastException: null
Also GC log
2018-03-12T06:11:47.318+0900: 1002398.844: [GC pause (G1 Evacuation Pause) (young), 0.0481982 secs]
[Parallel Time: 32.0 ms, GC Workers: 23]
[GC Worker Start (ms): Min: 1002398845.4, Avg: 1002398845.5, Max: 1002398845.5, Diff: 0.1]
[Ext Root Scanning (ms): Min: 1.1, Avg: 1.2, Max: 1.4, Diff: 0.3, Sum: 27.7]
[Update RS (ms): Min: 23.4, Avg: 23.8, Max: 24.7, Diff: 1.2, Sum: 546.3]
[Processed Buffers: Min: 27, Avg: 41.2, Max: 62, Diff: 35, Sum: 948]
[Scan RS (ms): Min: 0.4, Avg: 1.2, Max: 1.5, Diff: 1.1, Sum: 28.6]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[Object Copy (ms): Min: 5.2, Avg: 5.4, Max: 5.5, Diff: 0.2, Sum: 123.7]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.4]
[Termination Attempts: Min: 1, Avg: 1.5, Max: 3, Diff: 2, Sum: 34]
[GC Worker Other (ms): Min: 0.1, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 3.3]
[GC Worker Total (ms): Min: 31.6, Avg: 31.7, Max: 31.9, Diff: 0.3, Sum: 730.1]
[GC Worker End (ms): Min: 1002398877.1, Avg: 1002398877.2, Max: 1002398877.3, Diff: 0.2]
[Code Root Fixup: 0.1 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 1.9 ms]
[Other: 14.3 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 8.4 ms]
[Ref Enq: 0.5 ms]
[Redirty Cards: 0.2 ms]
[Humongous Register: 0.5 ms]
[Humongous Reclaim: 0.3 ms]
[Free CSet: 2.9 ms]
Kibana monitoring logs
Logs and monitoring chart looks not in trouble.
What can I do check more about this problem?