Hello, when I have an Elasticsearch cluster with nothing but .kibana and .marvel indicies in it, my consumer reading in the files will periodically encounter exceptions and close, due to the fact that ES is closing the connection
17-10-20 17:15:33 epgidvledw1044 ERROR [signafire.packrat:23] - Uncaught exception on async-thread-macro-2
                                                java.lang.Thread.run                 Thread.java:  745
                  java.util.concurrent.ThreadPoolExecutor$Worker.run     ThreadPoolExecutor.java:  622
                   java.util.concurrent.ThreadPoolExecutor.runWorker     ThreadPoolExecutor.java: 1152
                                                                 ...                                  
                                   clojure.core.async/thread-call/fn                   async.clj:  439
                       signafire.packrat.components.rodent.Rodent/fn                  rodent.clj:   97
                    signafire.packrat.components.rodent.Rodent/fn/fn                  rodent.clj:  131
                                                  clojure.core/dorun                    core.clj: 3009
                                                    clojure.core/seq                    core.clj:  137
                                                                 ...                                  
                                                 clojure.core/map/fn                    core.clj: 2629
                 signafire.packrat.components.rodent.Rodent/fn/fn/fn                  rodent.clj:  136
              signafire.packrat.components.rabbitmq/publish-failure!                rabbitmq.clj:   78
                                               langohr.queue/declare                   queue.clj:   75
com.rabbitmq.client.impl.recovery.AutorecoveringChannel.queueDeclare  AutorecoveringChannel.java:  266
                      com.rabbitmq.client.impl.ChannelN.queueDeclare               ChannelN.java:  844
                  com.rabbitmq.client.impl.AMQChannel.exnWrappingRpc             AMQChannel.java:  118
                      com.rabbitmq.client.impl.AMQChannel.privateRpc             AMQChannel.java:  219
                             com.rabbitmq.client.impl.AMQChannel.rpc             AMQChannel.java:  242
                    com.rabbitmq.client.impl.AMQChannel.quiescingRpc             AMQChannel.java:  251
               com.rabbitmq.client.impl.AMQChannel.quiescingTransmit             AMQChannel.java:  316
               com.rabbitmq.client.impl.AMQChannel.quiescingTransmit             AMQChannel.java:  334
                        com.rabbitmq.client.impl.AMQCommand.transmit             AMQCommand.java:  125
                        com.rabbitmq.client.impl.AMQConnection.flush          AMQConnection.java:  518
                   com.rabbitmq.client.impl.SocketFrameHandler.flush     SocketFrameHandler.java:  150
                                      java.io.DataOutputStream.flush       DataOutputStream.java:  123
                                  java.io.BufferedOutputStream.flush   BufferedOutputStream.java:  140
                            java.io.BufferedOutputStream.flushBuffer   BufferedOutputStream.java:   82
                                   java.net.SocketOutputStream.write     SocketOutputStream.java:  161
                             java.net.SocketOutputStream.socketWrite     SocketOutputStream.java:  115
                            java.net.SocketOutputStream.socketWrite0      SocketOutputStream.java     
java.net.SocketException: Broken pipe (Write failed)
         java.lang.Error: java.net.SocketException: Broken pipe (Write failed)
Packrat is the consumer I have that reads documents off RabbitMQ and into Elasticsearch search. When running this program under supervision, it will restart and grind through this process until the maximum number of indices needed are created and then everything runs well. I've found some other issues that seem to be related, but I'd like to note that I'm only creating 1 index for each month in a year and I'm also only allocating 5 shards for each date before 2000 and 25 for each date after 2000. I've attached the seemingly related issues below:
The ES cluster is 16 nodes, with 1 dedicated master, 2 dedicated clients and the rest are data nodes. The servers they are running on are all 8 core intel xeon e5 CPUs and 64GM of ram.
Thanks!