ELK data not getting saved

Rahul_Kumar2 · May 15, 2018, 6:29am

Hi awesome people

We are using ELK with Amazon SQS in an in-house server.

Elasticsearch log

[2018-05-15T11:39:03,411][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][old][982956][151886] duration [8.8s], collections [1]/[9.8s], total [8.8s]/[4.8d], memory [3.9gb]->[3.7gb]/[3.9gb], all_pools {[young] [399.4mb]->[288.8mb]/[399.4mb]}{[survivor] [41.6mb]->[0b]/[49.8mb]}{[old] [3.5gb]->[3.5gb]/[3.5gb]}
[2018-05-15T11:39:03,411][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][982956] overhead, spent [8.8s] collecting in the last [9.8s]
[2018-05-15T11:39:13,309][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][old][982958][151887] duration [8s], collections [1]/[8.8s], total [8s]/[4.8d], memory [3.9gb]->[3.8gb]/[3.9gb], all_pools {[young] [399.4mb]->[296.9mb]/[399.4mb]}{[survivor] [1.8mb]->[0b]/[49.8mb]}{[old] [3.5gb]->[3.5gb]/[3.5gb]}
[2018-05-15T11:39:13,309][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][982958] overhead, spent [8s] collecting in the last [8.8s]
[2018-05-15T11:39:20,377][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][982965] overhead, spent [432ms] collecting in the last [1s]
[2018-05-15T11:39:30,031][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][982966] overhead, spent [9s] collecting in the last [9.6s]
[2018-05-15T11:39:39,355][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][982975] overhead, spent [442ms] collecting in the last [1s]
[2018-05-15T11:39:47,357][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][982983] overhead, spent [401ms] collecting in the last [1s]
[2018-05-15T11:39:56,612][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][982984] overhead, spent [9s] collecting in the last [9.2s]
[2018-05-15T11:40:13,156][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][old][982992][151893] duration [9.2s], collections [1]/[9.3s], total [9.2s]/[4.8d], memory [3.9gb]->[3.7gb]/[3.9gb], all_pools {[young] [399.4mb]->[287mb]/[399.4mb]}{[survivor] [48.3mb]->[0b]/[49.8mb]}{[old] [3.5gb]->[3.5gb]/[3.5gb]}
[2018-05-15T11:40:13,156][WARN ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][982992] overhead, spent [9.2s] collecting in the last [9.3s]
[2018-05-15T11:40:22,385][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][983001] overhead, spent [432ms] collecting in the last [1s]
[2018-05-15T11:40:30,386][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][983009] overhead, spent [442ms] collecting in the last [1s]

Logstash log

[2018-05-15T09:22:03,420][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 429 ({"type"=>"es_rejected_execution_exception", "reason"=>"rejected execution of org.elasticsearch.transport.TransportService$7@254974e8 on EsThreadPoolExecutor[bulk, queue capacity = 50, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@7de20ac8[Running, pool size = 6, active threads = 6, queued tasks = 50, completed tasks = 60068]]"})
[2018-05-15T09:22:03,420][ERROR][logstash.outputs.elasticsearch] Retrying individual actions
[2018-05-15T09:22:03,420][ERROR][logstash.outputs.elasticsearch] Action
[2018-05-15T09:22:03,420][ERROR][logstash.outputs.elasticsearch] Action

After seeing the log and doing a lookup in the forum, for now, what I can understand is, memory allocated for various component of ELK is almost full.
And for now, the solution which I can think of now is to free some space, but I am not sure what and how to clear space.

Note: We don't need last year data, that can be cleared from everywhere

Need help, thanks.

JKhondhu · May 16, 2018, 5:07pm

@Rahul_Kumar2, what is the output of _cluster/health?v? Are you over sharded? My current presumption would be your cluster is under memory pressure from having too many indices for the current resources, thus you are seeing bulk rejections see: _cat/thread_pool?v ergo the 429 push back to logstash.

Note the above excerpt of log tells us that the older generation of garbage collection is taking just less than 10seconds and was able free up .2GB of memory from the Elasticsearch heap.

Rahul_Kumar2 · May 17, 2018, 7:21am

@JKhondhu Thanks for the reply.
I think I need to clear data (indices and logs). Should I clear the data? Will that help?
I am not sure could you guide me on this?

Meanwhile here the results you asked for
curl -XGET "localhost:9999/_cluster/health"

{ "cluster_name":"xxxx-api-logs",
"status":"red",
"timed_out":false,
"number_of_nodes":1,
"number_of_data_nodes":1,
"active_primary_shards":3577,
"active_shards":3577,
"relocating_shards":0,
"initializing_shards":4,
"unassigned_shards":3971,
"delayed_unassigned_shards":0,
"number_of_pending_tasks":144,
"number_of_in_flight_fetch":0,
"task_max_waiting_in_queue_millis":67954,
"active_shards_percent_as_number":47.364936440677965}

curl -XGET "localhost:9999/_cat/thread_pool?v"

node_name name active queue rejected
node-1 bulk 1 0 454
node-1 fetch_shard_started 0 0 0
node-1 fetch_shard_store 0 0 0
node-1 flush 3 1 0
node-1 force_merge 0 0 0
node-1 generic 1 0 0
node-1 get 0 0 0
node-1 index 0 0 0
node-1 listener 0 0 0
node-1 management 4 0 0
node-1 refresh 0 0 0
node-1 search 0 0 0
node-1 snapshot 0 0 0
node-1 warmer 0 0 0

dadoonet · May 17, 2018, 8:06am

You probably have too many shards per node.

May I suggest you look at the following resources about sizing:

https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing

Rahul_Kumar2 · May 17, 2018, 12:29pm

@JKhondhu @JKhondhu Implemented filebeat, apparently everything started to work, but don't know how long it will one. Do you guys have any idea, let me know.

Thanks

JKhondhu · May 17, 2018, 12:49pm

Hi,

Implementing filebeat opposed to logstash will not solve your problem. You will see the same happen.

You have too much data (indices) residing in Elasticsearch. You need to a. delete data past a retention period that you need or b. grow your cluster size.

Close to 4000 shards on a single node with 4GB heap is crazy and asking for trouble. Please see all the links @dadoonet shared and start to put in place curator or your own job to delete indices past X amount of days and make resources available so your cluster is able to ingest new data coming in via logstash.

Rahul_Kumar2 · May 18, 2018, 5:01am

@JKhondhu @dadoonet.. Thanks for the advice. I am looking in to the shared resources while also learning the ELK stack. For now, I have increased the heap size to 8GB and everything is working now. Will also put in curator.

Regards

system · June 15, 2018, 5:15am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Data is not saved in Elasticsearch Elasticsearch	2	585	October 19, 2023
ES 7.5.0 gets OOM Elasticsearch	3	485	January 16, 2020
Main problem with garbage collector Elasticsearch	9	1095	August 2, 2021
Elasticsearch not getting the new log updates Elasticsearch	1	387	September 17, 2018
ELK suddenly colapsed Elasticsearch	13	2501	July 5, 2017

ELK data not getting saved

Related topics