I have the following setup for an ELK Platform:
- logstash-forwarder on 20 machines as following:
- 4 machines (Nginx + sys logs)
- 4 machines (PostgreSQL)
- 12 machines (Ruby on Rails + sys logs)
And I have one machine that hosts the ELK platform with the following specs:
- 16GB Ram
- Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz (4 cores)
- SATA 6 Gb/s 7200 rpm
I used digital ocean tutorial to do all that (https://www.digitalocean.com/community/tutorials/how-to-install-elasticsearch-logstash-and-kibana-4-on-ubuntu-14-04) But I get the following errors:
/var/log/elasticsearch.log
[2015-10-14 13:56:22,743][ERROR][marvel.agent.exporter ] [General Orwell Taylor] create failure (index:[.marvel-2015.10.14] type: [cluster_stats]): EsRejectedExecutionException[rejected execution (queue capacity 500) on org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase$1@65fac6b3]
[2015-10-14 13:58:31,174][ERROR][marvel.agent.exporter ] [General Orwell Taylor] error sending data to [http://127.0.0.1:9200/.marvel-2015.10.14/_bulk]: [SocketTimeoutException[Read timed out]]
[2015-10-14 14:00:57,588][ERROR][marvel.agent.exporter ] [General Orwell Taylor] create failure (index:[.marvel-2015.10.14] type: [node_stats]): EsRejectedExecutionException[rejected execution (queue capacity 500) on org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase$1@50605923]
[2015-10-14 14:02:26,153][ERROR][marvel.agent.exporter ] [General Orwell Taylor] error sending data to [http://127.0.0.1:9200/.marvel-2015.10.14/_bulk]: [SocketTimeoutException[Read timed out]]
[2015-10-14 14:04:59,930][ERROR][marvel.agent.exporter ] [General Orwell Taylor] error sending data to [http://127.0.0.1:9200/.marvel-2015.10.14/_bulk]: SocketTimeoutException[Read timed out]
[2015-10-14 14:05:44,498][ERROR][marvel.agent.exporter ] [General Orwell Taylor] create failure (index:[.marvel-2015.10.14] type: [node_stats]): EsRejectedExecutionException[rejected execution (queue capacity 500) on org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase$1@5b840dba]
[2015-10-14 14:07:42,643][ERROR][marvel.agent.exporter ] [General Orwell Taylor] create failure (index:[.marvel-2015.10.14] type: [cluster_stats]): EsRejectedExecutionException[rejected execution (queue capacity 500) on org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase$1@6126efae]
[2015-10-14 14:13:39,190][ERROR][marvel.agent.exporter ] [General Orwell Taylor] create failure (index:[.marvel-2015.10.14] type: [cluster_stats]): EsRejectedExecutionException[rejected execution (queue capacity 500) on org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase$1@1c09e397]
/var/log/logstash.log
{:timestamp=>"2015-10-14T14:18:58.756000+0200", :message=>"Lumberjack input: The circuit breaker has detected a slowdown or stall in the pipeline, the input is closing the current connection and rejecting new connection until the pipeline recover.", :exception=>LogStash::CircuitBreaker::HalfOpenBreaker, :level=>:warn}
{:timestamp=>"2015-10-14T14:18:58.784000+0200", :message=>"Lumberjack input: the pipeline is blocked, temporary refusing new connection.", :level=>:warn}
{:timestamp=>"2015-10-14T14:18:58.826000+0200", :message=>"CircuitBreaker::Open", :name=>"Lumberjack input", :level=>:warn}
{:timestamp=>"2015-10-14T14:18:58.826000+0200", :message=>"Lumberjack input: The circuit breaker has detected a slowdown or stall in the pipeline, the input is closing the current connection and rejecting new connection until the pipeline recover.", :exception=>LogStash::CircuitBreaker::OpenBreaker, :level=>:warn}
{:timestamp=>"2015-10-14T14:19:02.786000+0200", :message=>"Lumberjack input: the pipeline is blocked, temporary refusing new connection.", :level=>:warn}
{:timestamp=>"2015-10-14T14:19:03.754000+0200", :message=>"CircuitBreaker::rescuing exceptions", :name=>"Lumberjack input", :exception=>LogStash::SizedQueueTimeout::TimeoutError, :level=>:warn}
{:timestamp=>"2015-10-14T14:19:03.755000+0200", :message=>"CircuitBreaker::rescuing exceptions", :name=>"Lumberjack input", :exception=>LogStash::SizedQueueTimeout::TimeoutError, :level=>:warn}
{:timestamp=>"2015-10-14T14:19:03.755000+0200", :message=>"Lumberjack input: The circuit breaker has detected a slowdown or stall in the pipeline, the input is closing the current connection and rejecting new connection until the pipeline recover.", :exception=>LogStash::CircuitBreaker::HalfOpenBreaker, :level=>:warn}
And thousands of
{:timestamp=>"2015-10-14T14:20:47.961000+0200", :message=>"retrying failed action with response code: 429", :level=>:warn}
So, most of the data are being missed and not indexed!