ELK Server Failed

Chris_Billett · August 6, 2018, 2:07pm

Hi,

My ELK server failed 2 days again with an issue in Elasticsearch for 'Too Many Files Open' I managed to resolved this moving out old indexes.

Now I can access Kibana, but im not seeing any new indexes being created nor any data being received.

I have received the pipeline blocked error in my logstash log files, so following information I increased my congestion_threashold to 400 (large number to stop the circuit breaker)

Im not seeing anything being indexed and receive message=>"retrying failed action with response code: 503", :level=>:warn} in the logs

Any suggestions?

Thanks

Badger · August 6, 2018, 2:10pm

What is the error in the elasticsearch logs?

Chris_Billett · August 6, 2018, 2:20pm

None - Last entry is I just restarted the server a minute or 2 before:

[2018-08-06 15:14:31,619][INFO ][node ] [nms1] started
[2018-08-06 15:14:47,665][INFO ][gateway ] [nms1] recovered [413] indices into cluster_state

Christian_Dahlqvist · August 6, 2018, 2:24pm

How much heap do you have assigned? How many indices and shards are there in the cluster?

Chris_Billett · August 6, 2018, 2:29pm

cluster_name":"nms",
"status":"red",
"timed_out":false,
"number_of_nodes":1,
"number_of_data_nodes":1,
"active_primary_shards":2043,
"active_shards":2043,
"relocating_shards":0,
"initializing_shards":0,
"unassigned_shards":2077,
"delayed_unassigned_shards":0,
"number_of_pending_tasks":0,
"number_of_in_flight_fetch":0,
"task_max_waiting_in_queue_millis":0,
"active_shards_percent_as_number":49.5873786407767}

heap size [19.9gb]

Some detail above...

So... I have noticed my cluster is red and it's loaded only 49% of my shards... Could this be due to moving and not using XDELETE on my old indexs?

I have put 1 back and restarted and can see the active primary shards has increased

Would this stop the data getting in?

Christian_Dahlqvist · August 6, 2018, 3:02pm

I think you have far too many shards for a cluster that size. Please read this blog post around shards and sharding.

Chris_Billett · August 6, 2018, 3:12pm

I plan to do this after I can get the server working again. I have had this working with more shards so I should be able to get it to a workable state first.

Where else can I go with this?

Chris_Billett · August 6, 2018, 3:58pm

Ok so I have got a little further... I shutdown ELK, moved out some indexes from around the time the system crashed and restarted. I got a burst of data into the system. The my OSSEC servers received a connection refused error.

Chris_Billett · August 6, 2018, 4:53pm

I have worked out I can get the data through with series of service restarts that flush the data through the pipe, logstash services followed by filebeat.

This is not ideal, but hopefully as the data catches up logstash will be ok to handle this realtime

system · September 3, 2018, 5:01pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ELK - indexes blocked : SERVICE_UNAVAILABLE/1/state not recovered Elasticsearch	8	781	May 31, 2019
Logstash end up with cluster_block_exception and is frequently occuring Logstash	6	4656	April 26, 2019
Elasticsearch Errors not indexing data Elasticsearch	3	1535	April 16, 2019
Logstash-Elastic problems Elasticsearch	1	676	March 14, 2018
Still getting 'max open shards' error after re-indexing and decreasing shards to 850 Elasticsearch	10	1561	August 14, 2020

ELK Server Failed

Related topics