Need help with configuration

m_abdelfattah · October 14, 2015, 12:27pm

I have the following setup for an ELK Platform:

logstash-forwarder on 20 machines as following:
4 machines (Nginx + sys logs)
4 machines (PostgreSQL)
12 machines (Ruby on Rails + sys logs)

And I have one machine that hosts the ELK platform with the following specs:

16GB Ram
Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz (4 cores)
SATA 6 Gb/s 7200 rpm

I used digital ocean tutorial to do all that (https://www.digitalocean.com/community/tutorials/how-to-install-elasticsearch-logstash-and-kibana-4-on-ubuntu-14-04) But I get the following errors:

/var/log/elasticsearch.log

[2015-10-14 13:56:22,743][ERROR][marvel.agent.exporter    ] [General Orwell Taylor] create failure (index:[.marvel-2015.10.14] type: [cluster_stats]): EsRejectedExecutionException[rejected execution (queue capacity 500) on org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase$1@65fac6b3]
[2015-10-14 13:58:31,174][ERROR][marvel.agent.exporter    ] [General Orwell Taylor] error sending data to [http://127.0.0.1:9200/.marvel-2015.10.14/_bulk]: [SocketTimeoutException[Read timed out]]
[2015-10-14 14:00:57,588][ERROR][marvel.agent.exporter    ] [General Orwell Taylor] create failure (index:[.marvel-2015.10.14] type: [node_stats]): EsRejectedExecutionException[rejected execution (queue capacity 500) on org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase$1@50605923]
[2015-10-14 14:02:26,153][ERROR][marvel.agent.exporter    ] [General Orwell Taylor] error sending data to [http://127.0.0.1:9200/.marvel-2015.10.14/_bulk]: [SocketTimeoutException[Read timed out]]
[2015-10-14 14:04:59,930][ERROR][marvel.agent.exporter    ] [General Orwell Taylor] error sending data to [http://127.0.0.1:9200/.marvel-2015.10.14/_bulk]: SocketTimeoutException[Read timed out]
[2015-10-14 14:05:44,498][ERROR][marvel.agent.exporter    ] [General Orwell Taylor] create failure (index:[.marvel-2015.10.14] type: [node_stats]): EsRejectedExecutionException[rejected execution (queue capacity 500) on org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase$1@5b840dba]
[2015-10-14 14:07:42,643][ERROR][marvel.agent.exporter    ] [General Orwell Taylor] create failure (index:[.marvel-2015.10.14] type: [cluster_stats]): EsRejectedExecutionException[rejected execution (queue capacity 500) on org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase$1@6126efae]
[2015-10-14 14:13:39,190][ERROR][marvel.agent.exporter    ] [General Orwell Taylor] create failure (index:[.marvel-2015.10.14] type: [cluster_stats]): EsRejectedExecutionException[rejected execution (queue capacity 500) on org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase$1@1c09e397]

/var/log/logstash.log

{:timestamp=>"2015-10-14T14:18:58.756000+0200", :message=>"Lumberjack input: The circuit breaker has detected a slowdown or stall in the pipeline, the input is closing the current connection and rejecting new connection until the pipeline recover.", :exception=>LogStash::CircuitBreaker::HalfOpenBreaker, :level=>:warn}
{:timestamp=>"2015-10-14T14:18:58.784000+0200", :message=>"Lumberjack input: the pipeline is blocked, temporary refusing new connection.", :level=>:warn}
{:timestamp=>"2015-10-14T14:18:58.826000+0200", :message=>"CircuitBreaker::Open", :name=>"Lumberjack input", :level=>:warn}
{:timestamp=>"2015-10-14T14:18:58.826000+0200", :message=>"Lumberjack input: The circuit breaker has detected a slowdown or stall in the pipeline, the input is closing the current connection and rejecting new connection until the pipeline recover.", :exception=>LogStash::CircuitBreaker::OpenBreaker, :level=>:warn}
{:timestamp=>"2015-10-14T14:19:02.786000+0200", :message=>"Lumberjack input: the pipeline is blocked, temporary refusing new connection.", :level=>:warn}
{:timestamp=>"2015-10-14T14:19:03.754000+0200", :message=>"CircuitBreaker::rescuing exceptions", :name=>"Lumberjack input", :exception=>LogStash::SizedQueueTimeout::TimeoutError, :level=>:warn}
{:timestamp=>"2015-10-14T14:19:03.755000+0200", :message=>"CircuitBreaker::rescuing exceptions", :name=>"Lumberjack input", :exception=>LogStash::SizedQueueTimeout::TimeoutError, :level=>:warn}
{:timestamp=>"2015-10-14T14:19:03.755000+0200", :message=>"Lumberjack input: The circuit breaker has detected a slowdown or stall in the pipeline, the input is closing the current connection and rejecting new connection until the pipeline recover.", :exception=>LogStash::CircuitBreaker::HalfOpenBreaker, :level=>:warn}

And thousands of

{:timestamp=>"2015-10-14T14:20:47.961000+0200", :message=>"retrying failed action with response code: 429", :level=>:warn}

So, most of the data are being missed and not indexed!

warkolm · October 14, 2015, 9:49pm

It means ES is overloaded.

What is the config you used for ES?

m_abdelfattah · October 14, 2015, 10:17pm

Nothing special... Just the default configurations!

warkolm · October 14, 2015, 10:20pm

Then you're only running with a maximum of 1GB of heap, which won't be helping.
You should increase that, I'd start at a min of 4GB.

m_abdelfattah · October 14, 2015, 11:35pm

Thank you @warkolm, I edited /etc/default/elasticsearch and set ES_HEAP_SIZE=8g and now I've very high load 25-30! and can't run Kibana because of timeout!

But still getting thousands of the following errors in logstash.log

{:timestamp=>"2015-10-15T01:35:03.842000+0200", :message=>"retrying failed action with response code: 503", :level=>:warn}
{:timestamp=>"2015-10-15T01:35:35.969000+0200", :message=>"CircuitBreaker::rescuing exceptions", :name=>"Lumberjack input", :exception=>LogStash::SizedQueueTimeout::TimeoutError, :level=>:warn}
{:timestamp=>"2015-10-15T01:35:35.969000+0200", :message=>"Lumberjack input: The circuit breaker has detected a slowdown or stall in the pipeline, the input is closing the current connection and rejecting new connection until the pipeline recover.", :exception=>LogStash::CircuitBreaker::HalfOpenBreaker, :level=>:warn}
{:timestamp=>"2015-10-15T01:35:03.895000+0200", :message=>"Lumberjack input: the pipeline is blocked, temporary refusing new connection.", :level=>:warn}

TimTim · October 15, 2015, 12:46pm

We had a similar problem where we installed everything on 1 machine. Started with 4Gb and realised pretty soon we needed more...

So we are running with 25Gb at the moment for the whole machine (I want to go up to 64Gb actually). Heapsizes are the following:

Logstash = 6Gb
Elasticsearch = 8Gb

But we have a problem where we run out of memory every 5 to 6 days and have to restart ES (v1.4.5)

warkolm · October 15, 2015, 8:10pm

You don't need that much heap for LS, you should be able to move some of that to ES>

Topic		Replies	Views
Logstsh elastic search struggling Logstash	10	1296	July 6, 2017
Logstash limitting ElasticSearch heap Elasticsearch	5	459	July 6, 2017
Logstash reporting error sending to Elasticsearch Elasticsearch	57	12310	March 8, 2017
Exception in thread "LogStash::Runner" org.jruby.exceptions.RaiseException: (TimeoutError) watchdog timeout Elasticsearch	4	1027	July 6, 2017
Gc overhead reduces ElasticSearch Performance Elasticsearch	14	13486	September 22, 2018

Need help with configuration

Related topics