Running curator is no longer freeing up space for us

dfinn · November 20, 2017, 10:20pm

elasticsearch 5.3.1

We a single node ES cluster with roughly 600 gig assigned to it. Every so often it would get above the high water mark and I'd run curator usually deleting any indices older than 60 days. In the past this would free up several hundred gigs of space and we'd be good to go for a while. I noticed recently that when I'd run curator, we'd get less and less free space back. Today when I got alerted that the disk space was full, I ran curator and got almost no free space back.

I'm not exactly sure what changed but I would love to get some help because I'm not sure what could be causing this. Curator still says it's deleting indexes but I'm not sure why the space isn't being reclaimed.

I thought maybe ES was too busy trying to keep up writing data AND purging space so I let it run for a while today with logstash disabled so that no new logs were coming it. This didn't seem to have any impact.

Here's some info from the server:

root@ps-prod-elk:/var/log/elasticsearch# cd /var/lib/elasticsearch/nodes/0/
root@ps-prod-elk:/var/lib/elasticsearch/nodes/0# du -shx .
571G .
root@ps-prod-elk:/var/lib/elasticsearch/nodes/0# df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/prod--elk--vg00-lv--root 618G 576G 11G 99% /

dfinn · November 20, 2017, 10:21pm

root@ps-prod-elk:/var/lib/elasticsearch/nodes/0# curl 'localhost:9200/cat/indices?v'
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open logstash-2017.12.28 ELoaXq5kQ-6EdWCbeU2bFg 5 0 105281 0 40.7mb 40.7mb
green open logstash-2017.12.21 lq5Y4phGQDS6p4AxDS_Cbg 5 0 111098 0 42.5mb 42.5mb
green open logstash-2017.10.30 u8_kuBoSRyOxPw17VKWtUQ 5 0 21254606 0 17.8gb 17.8gb
green open logstash-2017.12.07 N1O-mjJ7QkKYsSejrkdB_Q 5 0 109788 0 42.7mb 42.7mb
green open logstash-2017.11.10 Cet5LarcSdSCbShEDR7oZQ 5 0 21299070 0 18.6gb 18.6gb
green open logstash-2017.12.03 ZBVXQ1wNSnmBAe50KBZrdA 5 0 113258 0 43.6mb 43.6mb
green open logstash-2017.11.11 nx-9J50yTsmUVFzrLCXr7Q 5 0 10336451 0 7.6gb 7.6gb
green open logstash-2017.12.13 k6rtVbPsSqe2W2jWm9h8bw 5 0 109179 0 41.7mb 41.7mb
green open logstash-2017.11.13 nFzSN1JzSgWsL4Kf1Sze-Q 5 0 24398517 0 21.6gb 21.6gb
green open logstash-2017.12.18 Cy5ekdXeTw-ts7HUuO5pyQ 5 0 104397 0 40.5mb 40.5mb
green open logstash-2017.12.11 8h8054B6QK-KkULc5-IA1Q 5 0 111067 0 43mb 43mb
green open logstash-2017.12.26 11QiJeEzTDCUk7GYKgnvBA 5 0 114170 0 43.1mb 43.1mb
green open logstash-2017.10.29 Jt2L8swQS1K5LrU1NUPKwA 5 0 9087462 0 6.1gb 6.1gb
green open logstash-2017.11.15 TNECzWEMR32pg7m7Loh0VQ 5 0 26008448 0 23.2gb 23.2gb
green open logstash-2017.11.27 6-QE7cPgT_CPyz-gP94NpA 5 0 107056 0 41.4mb 41.4mb
green open logstash-2017.12.19 9sGxQ6urT-W3ZfckMe58g 5 0 106945 0 40.7mb 40.7mb
green open logstash-2017.12.14 4IzkQngSQW-mlpEsk4jRhg 5 0 108428 0 41.7mb 41.7mb
green open logstash-2017.12.10 P-DwseVeRv2w6dyKkKzqyw 5 0 132400 0 51.1mb 51.1mb
green open logstash-2017.11.12 W-8zBAQsQI-LXpn7olHQhw 5 0 9726476 0 7gb 7gb
green open logstash-2017.10.31 ISMDie9qSu6dFm-VaGe6Hg 5 0 20832737 0 17.5gb 17.5gb
green open logstash-2017.10.22 BBpBZ2HrT5O5_9ABSUjnpA 5 0 7474561 0 4.4gb 4.4gb
green open logstash-2017.12.09 ZCt5ef6BSOWJTiYG8vOnJQ 5 0 113875 0 43.8mb 43.8mb
green open logstash-2017.10.26 aofj3vvaR_yF7qfvfC64zw 5 0 17958135 0 14.7gb 14.7gb
green open logstash-2017.12.06 oNPaiKgJQrG8Qe2np_9nqQ 5 0 110867 0 42.9mb 42.9mb
green open logstash-2017.11.26 iLhzuu_DQSy7hgqivC5dTQ 5 0 110007 0 42.9mb 42.9mb
green open logstash-2017.10.21 mOR9b8h9To6k9wmWQ20THg 5 0 7569658 0 4.5gb 4.5gb
green open logstash-2017.11.20 tO98PDi-TZmp44pEi7QlDw 5 0 21880906 0 19.2gb 19.2gb
green open logstash-2017.12.30 RnS2jBOLTam7T7JKd_IpNA 5 0 106561 0 40.9mb 40.9mb
green open logstash-2017.10.25 w103dmdrTWC4fuYB258lnQ 5 0 18550266 0 15.1gb 15.1gb
green open logstash-2017.12.08 vT1ntHdPQ6iaW8tUiCBsEw 5 0 114595 0 44.2mb 44.2mb
green open logstash-2017.11.17 gH0tvwIuQ5KiYjU7T6HWvA 5 0 25816269 0 22.8gb 22.8gb
green open logstash-2017.11.01 IJXZxIsRS92YlZ_2v-pJfQ 5 0 21190946 0 17.8gb 17.8gb
green open logstash-2017.10.10 gD78cz4wQcKmo79-keIHKA 5 0 12539609 0 9.3gb 9.3gb
green open logstash-2017.11.07 N0zDo_IPQh-MWjhedJXYDA 5 0 26225074 0 22.9gb 22.9gb
green open logstash-2017.11.22 uyZfssiZSNuJMb5PqnFJiw 5 0 53483 0 23.1mb 23.1mb
green open logstash-2017.10.12 cCL2fNIGS-2X7eAeOL09bg 5 0 12760386 0 9.5gb 9.5gb
green open logstash-2017.11.24 60PALkOsQV6iSWQeRV2bvA 5 0 86154 0 34.1mb 34.1mb
green open logstash-2017.11.19 CyAIgJIRTOiRGezGm92D-w 5 0 10343504 0 7.5gb 7.5gb
green open logstash-2017.11.14 Voc65JIUTdaPTUFLTXzhYA 5 0 24914240 0 22.1gb 22.1gb
green open logstash-2017.10.27 CcxBCyZsSuOPByF0JUWkLA 5 0 17309126 0 14gb 14gb
green open logstash-2017.10.24 7w90aptpQqm7SvkUI4o46g 5 0 20333119 0 16.7gb 16.7gb
green open logstash-2017.11.02 vmhM-VdARri2-xH692KkVA 5 0 21351775 0 17.8gb 17.8gb
green open .kibana rf17ELguQjq-I5ds66xnrw 1 0 80 2 131.8kb 131.8kb
green open logstash-2017.11.05 XTArEU7xQASiDW34cZhL-A 5 0 9759635 0 6.8gb 6.8gb
green open logstash-2017.10.16 bRjuZjyKS8eRNIKkDl8kQQ 5 0 13891992 0 10.5gb 10.5gb
green open logstash-2017.11.06 mtd3A-PvTKe31ZpapgGaFA 5 0 25731549 0 22.4gb 22.4gb
green open logstash-2017.10.15 TpVc_fZdTo-KpnfjtcngvQ 5 0 7623641 0 4.4gb 4.4gb
green open logstash-2017.10.11 eWZPJyzST8SDATZpbuMNBg 5 0 12831294 0 9.6gb 9.6gb
green open logstash-2017.12.17 MmJl9P2BRQaJHmqC-V88eQ 5 0 102134 0 40.1mb 40.1mb
green open logstash-2017.12.29 Mo68OS-TTMCvMdi1F3zuxw 5 0 102029 0 39.2mb 39.2mb
green open logstash-2017.11.16 hYAkxO_CQP69WphPepjs9A 5 0 25347653 0 22.5gb 22.5gb
green open logstash-2017.10.20 nElW2tKDTWWTHzuIINRhtA 5 0 12520805 0 9.3gb 9.3gb
green open logstash-2017.10.28 AH2coafTQ3itzWQkfm802w 5 0 8973639 0 6gb 6gb

dfinn · November 20, 2017, 10:21pm

green open logstash-2017.12.24 y5MNyQm2Sdm3CzXTUdZ-5A 5 0 green open logstash-2017.12.22 wUlPQB2JQ7mX7Fhp-tofFw 5 0 green open logstash-2017.11.03 MpOqBJjzRPq3l1h4vfnScg green open logstash-2017.12.05 twxdWa3aQ_ewLHYE6ZV8Iw 5 0 green open logstash-2017.12.02 zTyBW-U8QUG4tvQttIzjrQ 5 0 green open logstash-2017.12.04 z-5LTiHrTzGFfaPak390Ww 5 0 green open logstash-2017.11.23 T00LxlHDRNSy5s8P4xxAAw 5 0 green open logstash-2017.10.17 sgeG36Q5TF-Ni08IpVOGVg green open logstash-2017.12.31 hkbc6YQxQfyfoOKb-cqWYw 5 0 green open logstash-2017.11.29 hbQK-912SAeUxk3Pqnpvyg 5 0 green open logstash-2017.12.25 qclCe2BIQHGxzDmmkBhDFA 5 0 green open logstash-2017.10.23 p_KlSsBkSLmqnzY7_m5rSQ green open logstash-2017.12.12 8QSQlVLcSRyLhXGxBuVNNg 5 0 green open logstash-2017.12.27 i9lwUOV8TPOWuXFDqrKiTw 5 0 green open logstash-2017.10.18 x7MY6dgXRUihN2Id1NTEyw green open logstash-2017.10.19 ZWyixLQqSQutwm1x24bzRg yellow open logstash-2017.10.09 srZW885ZS7mrlw7w5F6HZg 5 1 green open logstash-2017.11.21 c7TE33rARKG4l-EbodPOGA 5 0 yellow open logstash-2017.10.07 Gxm3Iu9wTmq1joNS_CWbLQ 5 1 green open logstash-2017.12.16 5-POTrAgT6yY1MW_HUzebw 5 0 green open logstash-2017.11.18 yKg8Oi_KSZS-v23719S9pQ green open logstash-2017.11.25 _3TPyI00TVWlRcKJ9WEDvw 5 0 green open logstash-2017.11.30 x5BRx5-USh2L4uo_H6JQLg 5 0 green open logstash-2017.12.23 4-uMgVWxQRmit5jpsHw30A 5 0 yellow open logstash-2017.10.08 P7ZUAUpSTM-sodJkdNS7aw 5 1 green open logstash-2017.10.14 l1Qrf3dsQc-etALwV0PpKg 5 0 green open logstash-2018.01.01 xGCYdnJjS6Spyrp0miFd0w 5 0 green open logstash-2017.11.28 crzV7LZ5SyCuNWkfZt5ZDw 5 0 green open logstash-2017.12.15 9kzwDfGxQXeOnoB_Gxq5TA 5 0 green open logstash-2017.12.20 4xGckU1uRtC29lkp_WFBig 5 0 green open logstash-2017.10.13 -tvpfoDNSOCP57IJ7aYQXw green open logstash-2017.11.08 vOWPJCRCRKWJDO20vFOZCQ green open logstash-2017.11.09 8n3Ciq3oRZ-9Z_e2-3MGXQ green open logstash-2017.11.04 5YheRnesTduTGpvwWxfndw 5 0 green open logstash-2017.12.01 0pvaiPobSZWhcUDlcRE6bA 5 0 109956 0 42mb 42mb
108594 0 41.7mb 41.7mb
5 0 20488097 0 17.2gb 17.2gb
107937 0 41.8mb 41.8mb
107285 0 41.9mb 41.9mb
115263 0 44.3mb 44.3mb
63852 0 26.1mb 26.1mb
5 0 14567741 0 11.1gb 11.1gb
107161 0 41.3mb 41.3mb
110375 0 42.4mb 42.4mb
110992 0 41.7mb 41.7mb
5 0 21614847 0 17.8gb 17.8gb
110522 0 42.6mb 42.6mb
123388 0 46.9mb 46.9mb
5 0 14761017 0 11.2gb 11.2gb
5 0 13183249 0 9.9gb 9.9gb
23 0 183.9kb 183.9kb
647863 0 281.4mb 281.4mb
29 0 284.6kb 284.6kb
103012 0 39.5mb 39.5mb
5 0 11848539 0 9gb 9gb
101103 0 39mb 39mb
112322 0 43.3mb 43.3mb
123112 0 49.5mb 49.5mb
26 0 254.4kb 254.4kb
7808591 0 4.6gb 4.6gb
41678 0 18mb 18mb
109456 0 41.7mb 41.7mb
115997 0 44.9mb 44.9mb
109720 0 42.3mb 42.3mb
5 0 12635230 0 9.3gb 9.3gb
5 0 23758251 0 20.6gb 20.6gb
5 0 22727699 0 19.9gb 19.9gb
9855506 0 6.9gb 6.9gb
114115 0 44.4mb 44.4mb

theuntergeek · November 20, 2017, 10:45pm

This is likely a bad thing. How many shards are there on this single node? How big is the heap? Based on the sizes of the indices I'm seeing here, you likely have a very high number of shards on this single node. If you have a heap of 30G, and have more than 600 shards on that single node, you will begin to see memory pressure in your cluster. This will affect everything, from indexing, and even to other cluster update actions, like trying to normalize the cluster state after deleting. This is because each open shard has an overhead cost in heap memory, regardless of how much data it contains.

You would do well to switch from daily indices to rollover indices, and not rollover until each shard in the index is over 10G. With a single node, you also probably shouldn't have more than 2 shards per index.

dfinn · November 20, 2017, 10:50pm

Sorry, I'm far from an ES expert but certainly appreciate the help.

As far as the heap size, I believe we are using 128 gig. Assuming this is what you mean by heap size, we are starting ES with these options:

/usr/bin/java -Xms128g -Xmx128g

How can I tell how many shards there are?

This setup had been working very well up until this recent issue with curator not freeing up space. This has been in place for about a year.

theuntergeek · November 20, 2017, 10:58pm

Oh, my. That's definitely outside our recommended best practices. There are many reasons for this, one of which is extremely long garbage collection pauses, which could cause the cluster to stall. This also could be what you're encountering.

The following command will tell you the total number of primary shards:

curl -s http://127.0.0.1:9200/_cat/indices | awk '{sum += $5} END {print sum}'

Substitute your own IP for 127.0.0.1

dfinn · November 20, 2017, 10:59pm

root@ps-prod-elk:/var/lib/elasticsearch/nodes/0# curl -s http://127.0.0.1:9200/_cat/indices | awk '{sum += $5} END {print sum}'
436

theuntergeek · November 20, 2017, 11:03pm

Thanks. The million dollar question is, "What was that number before you ran Curator?"

What is in the Elasticsearch logs of this single node? There are almost certain to be some errors, if space isn't being freed. Do you have any kind of monitoring in place, so we can see what is happening with the heap, and garbage collection?

dfinn · November 20, 2017, 11:07pm

Unfortunately without a time machine I don't think I have any way to tell what that number was before curator ran.

As far as errors in the logs, I've been watching them all afternoon while I look into this and I haven't really seen any errors. There are informational messages like so:

[2017-11-20T16:04:09,230][WARN ][o.e.c.r.a.DiskThresholdMonitor] [lGLn5hx] high disk watermark [90%] exceeded on [lGLn5hx4TyKKRTRMqT1VDQ][lGLn5hx][/var/lib/elasticsearch/nodes/0] free: 6.9gb[1.1%], shards will be relocated away from this node
[2017-11-20T16:04:09,230][INFO ][o.e.c.r.a.DiskThresholdMonitor] [lGLn5hx] rerouting shards: [high disk watermark exceeded on one or more nodes]
[2017-11-20T16:04:16,531][INFO ][o.e.m.j.JvmGcMonitorService] [lGLn5hx] [gc][3164] overhead, spent [259ms] collecting in the last [1s]

If I stop logstash from running then the messages about garbage collection subside. I'm guessing this means that without logstash running elasticsearch is able to "catch up" but it still doesn't seem to be freeing up any space. I left it like this for a bit with logstash disabled and nothing seemed to change and nothing different was written to the logs.

theuntergeek · November 20, 2017, 11:29pm

At this point, I’d consider stopping Logstash and completely restarting Elasticsearch. Start with a clean JVM.

dfinn · November 20, 2017, 11:30pm

OK, I've just done that. How long do you think I should leave it running like this, without logstash enabled?

dfinn · November 20, 2017, 11:42pm

I just roughly totaled up the sizes of our indexes and it does seem to match what is being used on disk. Maybe we've just grown the amount of logs we are ingesting and we can no longer keep as much as we used to be able to. I may have to pay closer attention to this moving forward.

theuntergeek · November 20, 2017, 11:50pm

Again, I recommend using rollover indices, rather than dailies. It will help reduce the size of the cluster state by not needing to have so many open shards. With only a single node, there's no reason to have 5 shards per index.

dfinn · November 20, 2017, 11:52pm

I will definitely look into that. Where might I be able to find more info about making the switch to that?

Also, let's say that you thought maybe something was "artificially" bloating the size of your index by either submitting duplicated log entries or data that doesn't belong in ES. How might you find that when you're dealing with a 20 gig index? I don't have any reason at the moment to think that we have that going on yet but it does make me worried that it could be happening.

theuntergeek · November 21, 2017, 12:12am

Good questions! We'd prefer that new questions like that be asked in new topics, to allow the answers to remain relevant to the original topic.

dfinn · November 21, 2017, 12:13am

OK, I'll start some new threads then. Thanks for your help!

system · December 19, 2017, 12:13am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Deleting index with curator not freeing the disk space Elasticsearch	2	1184	December 14, 2017
Issues when running Curator Elasticsearch	5	620	July 5, 2017
Timeouts while deleting Elasticsearch	27	9452	July 6, 2017
Memory Explosion: Heap Dump in less than one minute Elasticsearch	8	969	July 6, 2017
Elasticsearch logs & Marvel indices removing Elasticsearch	15	3587	July 5, 2017

Running curator is no longer freeing up space for us

Related topics