java.io.IOException: java.util.concurrent.TimeoutException: Idle timeout expired


(Phil Yardley) #1

Whilst updating records in the index this morning, our elastic service decided to spit out it's dummy and come to an abrupt halt.

In the logs we can see lots of GC activity, and if I'm understanding the numbers correctly, GC seems to be slowing down (or taking longer)..

The post is a little long so will put the log snippet for the GC in a seperate post below!

The failure:

[2018-11-26T07:50:36,814][WARN ][o.e.j.s.ServletHandler   ] /availabilityv2%2f460/_bulk_docs
java.io.IOException: java.util.concurrent.TimeoutException: Idle timeout expired: 33128/30000 ms
	at org.eclipse.jetty.server.HttpInput$ErrorState.noContent(HttpInput.java:831) ~[jetty-server-9.3.13.v20161014.jar:9.3.13.v20161014]
	at org.eclipse.jetty.server.HttpInput.read(HttpInput.java:189) ~[jetty-server-9.3.13.v20161014.jar:9.3.13.v20161014]
	at org.apache.commons.io.IOUtils.read(IOUtils.java:2454) ~[commons-io-2.4.jar:2.4]
	at org.apache.commons.io.IOUtils.readFully(IOUtils.java:2537) ~[commons-io-2.4.jar:2.4]
	at com.couchbase.capi.servlet.CAPIServlet.handleBulkDocs(CAPIServlet.java:520) ~[couchbase-capi-server-1.6.2.jar:?]
	at com.couchbase.capi.servlet.CAPIServlet.service(CAPIServlet.java:89) ~[couchbase-capi-server-1.6.2.jar:?]
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) ~[javax.servlet-api-3.1.0.jar:3.1.0]
	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:845) ~[jetty-servlet-9.3.13.v20161014.jar:9.3.13.v20161014]
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584) [jetty-servlet-9.3.13.v20161014.jar:9.3.13.v20161014]
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) [jetty-server-9.3.13.v20161014.jar:9.3.13.v20161014]
	at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:513) [jetty-security-9.3.13.v20161014.jar:9.3.13.v20161014]
	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) [jetty-server-9.3.13.v20161014.jar:9.3.13.v20161014]
	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) [jetty-server-9.3.13.v20161014.jar:9.3.13.v20161014]
	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) [jetty-servlet-9.3.13.v20161014.jar:9.3.13.v20161014]
	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) [jetty-server-9.3.13.v20161014.jar:9.3.13.v20161014]
	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) [jetty-server-9.3.13.v20161014.jar:9.3.13.v20161014]
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) [jetty-server-9.3.13.v20161014.jar:9.3.13.v20161014]
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) [jetty-server-9.3.13.v20161014.jar:9.3.13.v20161014]
	at org.eclipse.jetty.server.Server.handle(Server.java:523) [jetty-server-9.3.13.v20161014.jar:9.3.13.v20161014]
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) [jetty-server-9.3.13.v20161014.jar:9.3.13.v20161014]
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) [jetty-server-9.3.13.v20161014.jar:9.3.13.v20161014]
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273) [jetty-io-9.3.13.v20161014.jar:9.3.13.v20161014]
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95) [jetty-io-9.3.13.v20161014.jar:9.3.13.v20161014]
	at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) [jetty-io-9.3.13.v20161014.jar:9.3.13.v20161014]
	at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) [jetty-util-9.3.13.v20161014.jar:9.3.13.v20161014]
	at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) [jetty-util-9.3.13.v20161014.jar:9.3.13.v20161014]
	at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) [jetty-util-9.3.13.v20161014.jar:9.3.13.v20161014]
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) [jetty-util-9.3.13.v20161014.jar:9.3.13.v20161014]
	at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) [jetty-util-9.3.13.v20161014.jar:9.3.13.v20161014]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
Caused by: java.util.concurrent.TimeoutException: Idle timeout expired: 33128/30000 ms
	at org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:166) ~[?:?]
	at org.eclipse.jetty.io.IdleTimeout$1.run(IdleTimeout.java:50) ~[?:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_151]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_151]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[?:1.8.0_151]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[?:1.8.0_151]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_151]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_151]
	... 1 more

even though there's no reference to memory issues in the logs (edit - yes there is.. see below), I've since increased the jvm heap size to 8GB (we scaled up the AWS instances to 16GB machines temporarily)

the index isn't "massive" by ES standards!

pri rep docs.count docs.deleted store.size pri.store.size
  1   2    2782021      1933869      2.8gb            1gb

is there anything further I should be checking to ensure this doesn't fall over tomorrow when updating the index?

thanks

Phil


(Phil Yardley) #2

GC snippet:

[2018-11-26T07:49:10,081][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][old][572637][957] duration [6.1s], collections [1]/[6.5s], total [6.1s]/[59.5m], memory [2.6gb]->[2.7gb]/[2.9gb], all_pools {[young] [343.5mb]->[435.8mb]/[532.5mb]}{[survivor] [0b]->[0b]/[66.5mb]}{[old] [2.3gb]->[2.3gb]/[2.3gb]}
[2018-11-26T07:49:10,081][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][572637] overhead, spent [6.1s] collecting in the last [6.5s]
[2018-11-26T07:49:17,822][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][old][572638][958] duration [7.3s], collections [1]/[7.7s], total [7.3s]/[59.7m], memory [2.7gb]->[2.8gb]/[2.9gb], all_pools {[young] [435.8mb]->[487.5mb]/[532.5mb]}{[survivor] [0b]->[0b]/[66.5mb]}{[old] [2.3gb]->[2.3gb]/[2.3gb]}
[2018-11-26T07:49:17,822][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][572638] overhead, spent [7.3s] collecting in the last [7.7s]
[2018-11-26T07:49:24,078][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][old][572639][959] duration [5.9s], collections [1]/[6.2s], total [5.9s]/[59.8m], memory [2.8gb]->[2.8gb]/[2.9gb], all_pools {[young] [487.5mb]->[503mb]/[532.5mb]}{[survivor] [0b]->[0b]/[66.5mb]}{[old] [2.3gb]->[2.3gb]/[2.3gb]}
[2018-11-26T07:49:24,078][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][572639] overhead, spent [5.9s] collecting in the last [6.2s]
[2018-11-26T07:49:51,807][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][old][572640][963] duration [27.6s], collections [4]/[27.7s], total [27.6s]/[1h], memory [2.8gb]->[2.9gb]/[2.9gb], all_pools {[young] [503mb]->[532.5mb]/[532.5mb]}{[survivor] [0b]->[58.8mb]/[66.5mb]}{[old] [2.3gb]->[2.3gb]/[2.3gb]}
[2018-11-26T07:49:51,807][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][572640] overhead, spent [27.6s] collecting in the last [27.7s]
[2018-11-26T07:49:57,720][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][old][572641][964] duration [5.9s], collections [1]/[5.9s], total [5.9s]/[1h], memory [2.9gb]->[2.9gb]/[2.9gb], all_pools {[young] [532.5mb]->[530.5mb]/[532.5mb]}{[survivor] [58.8mb]->[47.9mb]/[66.5mb]}{[old] [2.3gb]->[2.3gb]/[2.3gb]}
[2018-11-26T07:49:57,720][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][572641] overhead, spent [5.9s] collecting in the last [5.9s]
[2018-11-26T07:50:09,419][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][old][572642][966] duration [11.6s], collections [2]/[11.6s], total [11.6s]/[1h], memory [2.9gb]->[2.9gb]/[2.9gb], all_pools {[young] [530.5mb]->[532.5mb]/[532.5mb]}{[survivor] [47.9mb]->[57.4mb]/[66.5mb]}{[old] [2.3gb]->[2.3gb]/[2.3gb]}
[2018-11-26T07:50:09,420][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][572642] overhead, spent [11.6s] collecting in the last [11.6s]

(Phil Yardley) #3

hmmm... I should read further in to the logs... just found this..

[2018-11-26T07:53:40,567][WARN ][o.e.j.s.ServletHandler   ] Error for /availabilityv2%2f775/_bulk_docs
java.lang.OutOfMemoryError: Java heap space

so hoping that doubling the space will resolve at least for the moment..

Are there any other optimisations that I could look at around this?

thanks