Running curator is no longer freeing up space for us

elasticsearch 5.3.1

We a single node ES cluster with roughly 600 gig assigned to it. Every so often it would get above the high water mark and I'd run curator usually deleting any indices older than 60 days. In the past this would free up several hundred gigs of space and we'd be good to go for a while. I noticed recently that when I'd run curator, we'd get less and less free space back. Today when I got alerted that the disk space was full, I ran curator and got almost no free space back.

I'm not exactly sure what changed but I would love to get some help because I'm not sure what could be causing this. Curator still says it's deleting indexes but I'm not sure why the space isn't being reclaimed.

I thought maybe ES was too busy trying to keep up writing data AND purging space so I let it run for a while today with logstash disabled so that no new logs were coming it. This didn't seem to have any impact.

Here's some info from the server:

root@ps-prod-elk:/var/log/elasticsearch# cd /var/lib/elasticsearch/nodes/0/
root@ps-prod-elk:/var/lib/elasticsearch/nodes/0# du -shx .
571G .
root@ps-prod-elk:/var/lib/elasticsearch/nodes/0# df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/prod--elk--vg00-lv--root 618G 576G 11G 99% /

root@ps-prod-elk:/var/lib/elasticsearch/nodes/0# curl 'localhost:9200/cat/indices?v'
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open logstash-2017.12.28 ELoaXq5kQ-6EdWCbeU2bFg 5 0 105281 0 40.7mb 40.7mb
green open logstash-2017.12.21 lq5Y4phGQDS6p4AxDS_Cbg 5 0 111098 0 42.5mb 42.5mb
green open logstash-2017.10.30 u8_kuBoSRyOxPw17VKWtUQ 5 0 21254606 0 17.8gb 17.8gb
green open logstash-2017.12.07 N1O-mjJ7QkKYsSejrkdB_Q 5 0 109788 0 42.7mb 42.7mb
green open logstash-2017.11.10 Cet5LarcSdSCbShEDR7oZQ 5 0 21299070 0 18.6gb 18.6gb
green open logstash-2017.12.03 ZBVXQ1wNSnmBAe50KBZrdA 5 0 113258 0 43.6mb 43.6mb
green open logstash-2017.11.11 nx-9J50yTsmUVFzrLCXr7Q 5 0 10336451 0 7.6gb 7.6gb
green open logstash-2017.12.13 k6rtVbPsSqe2W2jWm9h8bw 5 0 109179 0 41.7mb 41.7mb
green open logstash-2017.11.13 nFzSN1JzSgWsL4Kf1Sze-Q 5 0 24398517 0 21.6gb 21.6gb
green open logstash-2017.12.18 Cy5ekdXeTw-ts7HUuO5pyQ 5 0 104397 0 40.5mb 40.5mb
green open logstash-2017.12.11 8h8054B6QK-KkULc5-IA1Q 5 0 111067 0 43mb 43mb
green open logstash-2017.12.26 11QiJeEzTDCUk7GYKgnvBA 5 0 114170 0 43.1mb 43.1mb
green open logstash-2017.10.29 Jt2L8swQS1K5LrU1NUPKwA 5 0 9087462 0 6.1gb 6.1gb
green open logstash-2017.11.15 TNECzWEMR32pg7m7Loh0VQ 5 0 26008448 0 23.2gb 23.2gb
green open logstash-2017.11.27 6-QE7cPgT_CPyz-gP94NpA 5 0 107056 0 41.4mb 41.4mb
green open logstash-2017.12.19 9sGxQ6urT
-W3ZfckMe58g 5 0 106945 0 40.7mb 40.7mb
green open logstash-2017.12.14 4IzkQngSQW-mlpEsk4jRhg 5 0 108428 0 41.7mb 41.7mb
green open logstash-2017.12.10 P-DwseVeRv2w6dyKkKzqyw 5 0 132400 0 51.1mb 51.1mb
green open logstash-2017.11.12 W-8zBAQsQI-LXpn7olHQhw 5 0 9726476 0 7gb 7gb
green open logstash-2017.10.31 ISMDie9qSu6dFm-VaGe6Hg 5 0 20832737 0 17.5gb 17.5gb
green open logstash-2017.10.22 BBpBZ2HrT5O5_9ABSUjnpA 5 0 7474561 0 4.4gb 4.4gb
green open logstash-2017.12.09 ZCt5ef6BSOWJTiYG8vOnJQ 5 0 113875 0 43.8mb 43.8mb
green open logstash-2017.10.26 aofj3vvaR_yF7qfvfC64zw 5 0 17958135 0 14.7gb 14.7gb
green open logstash-2017.12.06 oNPaiKgJQrG8Qe2np_9nqQ 5 0 110867 0 42.9mb 42.9mb
green open logstash-2017.11.26 iLhzuu_DQSy7hgqivC5dTQ 5 0 110007 0 42.9mb 42.9mb
green open logstash-2017.10.21 mOR9b8h9To6k9wmWQ20THg 5 0 7569658 0 4.5gb 4.5gb
green open logstash-2017.11.20 tO98PDi-TZmp44pEi7QlDw 5 0 21880906 0 19.2gb 19.2gb
green open logstash-2017.12.30 RnS2jBOLTam7T7JKd_IpNA 5 0 106561 0 40.9mb 40.9mb
green open logstash-2017.10.25 w103dmdrTWC4fuYB258lnQ 5 0 18550266 0 15.1gb 15.1gb
green open logstash-2017.12.08 vT1ntHdPQ6iaW8tUiCBsEw 5 0 114595 0 44.2mb 44.2mb
green open logstash-2017.11.17 gH0tvwIuQ5KiYjU7T6HWvA 5 0 25816269 0 22.8gb 22.8gb
green open logstash-2017.11.01 IJXZxIsRS92YlZ_2v-pJfQ 5 0 21190946 0 17.8gb 17.8gb
green open logstash-2017.10.10 gD78cz4wQcKmo79-keIHKA 5 0 12539609 0 9.3gb 9.3gb
green open logstash-2017.11.07 N0zDo_IPQh-MWjhedJXYDA 5 0 26225074 0 22.9gb 22.9gb
green open logstash-2017.11.22 uyZfssiZSNuJMb5PqnFJiw 5 0 53483 0 23.1mb 23.1mb
green open logstash-2017.10.12 cCL2fNIGS-2X7eAeOL09bg 5 0 12760386 0 9.5gb 9.5gb
green open logstash-2017.11.24 60PALkOsQV6iSWQeRV2bvA 5 0 86154 0 34.1mb 34.1mb
green open logstash-2017.11.19 CyAIgJIRTOiRGezGm92D-w 5 0 10343504 0 7.5gb 7.5gb
green open logstash-2017.11.14 Voc65JIUTdaPTUFLTXzhYA 5 0 24914240 0 22.1gb 22.1gb
green open logstash-2017.10.27 CcxBCyZsSuOPByF0JUWkLA 5 0 17309126 0 14gb 14gb
green open logstash-2017.10.24 7w90aptpQqm7SvkUI4o46g 5 0 20333119 0 16.7gb 16.7gb
green open logstash-2017.11.02 vmhM-VdARri2-xH692KkVA 5 0 21351775 0 17.8gb 17.8gb
green open .kibana rf17ELguQjq-I5ds66xnrw 1 0 80 2 131.8kb 131.8kb
green open logstash-2017.11.05 XTArEU7xQASiDW34cZhL-A 5 0 9759635 0 6.8gb 6.8gb
green open logstash-2017.10.16 bRjuZjyKS8eRNIKkDl8kQQ 5 0 13891992 0 10.5gb 10.5gb
green open logstash-2017.11.06 mtd3A-PvTKe31ZpapgGaFA 5 0 25731549 0 22.4gb 22.4gb
green open logstash-2017.10.15 TpVc_fZdTo-KpnfjtcngvQ 5 0 7623641 0 4.4gb 4.4gb
green open logstash-2017.10.11 eWZPJyzST8SDATZpbuMNBg 5 0 12831294 0 9.6gb 9.6gb
green open logstash-2017.12.17 MmJl9P2BRQaJHmqC-V88eQ 5 0 102134 0 40.1mb 40.1mb
green open logstash-2017.12.29 Mo68OS-TTMCvMdi1F3zuxw 5 0 102029 0 39.2mb 39.2mb
green open logstash-2017.11.16 hYAkxO_CQP69WphPepjs9A 5 0 25347653 0 22.5gb 22.5gb
green open logstash-2017.10.20 nElW2tKDTWWTHzuIINRhtA 5 0 12520805 0 9.3gb 9.3gb
green open logstash-2017.10.28 AH2coafTQ3itzWQkfm802w 5 0 8973639 0 6gb 6gb

green open logstash-2017.12.24 y5MNyQm2Sdm3CzXTUdZ-5A 5 0 109956 0 42mb 42mb
green open logstash-2017.12.22 wUlPQB2JQ7mX7Fhp-tofFw 5 0 108594 0 41.7mb 41.7mb
green open logstash-2017.11.03 MpOqBJjzRPq3l1h4vfnScg 5 0 20488097 0 17.2gb 17.2gb
green open logstash-2017.12.05 twxdWa3aQ_ewLHYE6ZV8Iw 5 0 107937 0 41.8mb 41.8mb
green open logstash-2017.12.02 zTyBW-U8QUG4tvQttIzjrQ 5 0 107285 0 41.9mb 41.9mb
green open logstash-2017.12.04 z-5LTiHrTzGFfaPak390Ww 5 0 115263 0 44.3mb 44.3mb
green open logstash-2017.11.23 T00LxlHDRNSy5s8P4xxAAw 5 0 63852 0 26.1mb 26.1mb
green open logstash-2017.10.17 sgeG36Q5TF-Ni08IpVOGVg 5 0 14567741 0 11.1gb 11.1gb
green open logstash-2017.12.31 hkbc6YQxQfyfoOKb-cqWYw 5 0 107161 0 41.3mb 41.3mb
green open logstash-2017.11.29 hbQK-912SAeUxk3Pqnpvyg 5 0 110375 0 42.4mb 42.4mb
green open logstash-2017.12.25 qclCe2BIQHGxzDmmkBhDFA 5 0 110992 0 41.7mb 41.7mb
green open logstash-2017.10.23 p_KlSsBkSLmqnzY7_m5rSQ 5 0 21614847 0 17.8gb 17.8gb
green open logstash-2017.12.12 8QSQlVLcSRyLhXGxBuVNNg 5 0 110522 0 42.6mb 42.6mb
green open logstash-2017.12.27 i9lwUOV8TPOWuXFDqrKiTw 5 0 123388 0 46.9mb 46.9mb
green open logstash-2017.10.18 x7MY6dgXRUihN2Id1NTEyw 5 0 14761017 0 11.2gb 11.2gb
green open logstash-2017.10.19 ZWyixLQqSQutwm1x24bzRg 5 0 13183249 0 9.9gb 9.9gb
yellow open logstash-2017.10.09 srZW885ZS7mrlw7w5F6HZg 5 1 23 0 183.9kb 183.9kb
green open logstash-2017.11.21 c7TE33rARKG4l-EbodPOGA 5 0 647863 0 281.4mb 281.4mb
yellow open logstash-2017.10.07 Gxm3Iu9wTmq1joNS_CWbLQ 5 1 29 0 284.6kb 284.6kb
green open logstash-2017.12.16 5-POTrAgT6yY1MW_HUzebw 5 0 103012 0 39.5mb 39.5mb
green open logstash-2017.11.18 yKg8Oi_KSZS-v23719S9pQ 5 0 11848539 0 9gb 9gb
green open logstash-2017.11.25 _3TPyI00TVWlRcKJ9WEDvw 5 0 101103 0 39mb 39mb
green open logstash-2017.11.30 x5BRx5-USh2L4uo_H6JQLg 5 0 112322 0 43.3mb 43.3mb
green open logstash-2017.12.23 4-uMgVWxQRmit5jpsHw30A 5 0 123112 0 49.5mb 49.5mb
yellow open logstash-2017.10.08 P7ZUAUpSTM-sodJkdNS7aw 5 1 26 0 254.4kb 254.4kb
green open logstash-2017.10.14 l1Qrf3dsQc-etALwV0PpKg 5 0 7808591 0 4.6gb 4.6gb
green open logstash-2018.01.01 xGCYdnJjS6Spyrp0miFd0w 5 0 41678 0 18mb 18mb
green open logstash-2017.11.28 crzV7LZ5SyCuNWkfZt5ZDw 5 0 109456 0 41.7mb 41.7mb
green open logstash-2017.12.15 9kzwDfGxQXeOnoB_Gxq5TA 5 0 115997 0 44.9mb 44.9mb
green open logstash-2017.12.20 4xGckU1uRtC29lkp_WFBig 5 0 109720 0 42.3mb 42.3mb
green open logstash-2017.10.13 -tvpfoDNSOCP57IJ7aYQXw 5 0 12635230 0 9.3gb 9.3gb
green open logstash-2017.11.08 vOWPJCRCRKWJDO20vFOZCQ 5 0 23758251 0 20.6gb 20.6gb
green open logstash-2017.11.09 8n3Ciq3oRZ-9Z_e2-3MGXQ 5 0 22727699 0 19.9gb 19.9gb
green open logstash-2017.11.04 5YheRnesTduTGpvwWxfndw 5 0 9855506 0 6.9gb 6.9gb
green open logstash-2017.12.01 0pvaiPobSZWhcUDlcRE6bA 5 0 114115 0 44.4mb 44.4mb

This is likely a bad thing. How many shards are there on this single node? How big is the heap? Based on the sizes of the indices I'm seeing here, you likely have a very high number of shards on this single node. If you have a heap of 30G, and have more than 600 shards on that single node, you will begin to see memory pressure in your cluster. This will affect everything, from indexing, and even to other cluster update actions, like trying to normalize the cluster state after deleting. This is because each open shard has an overhead cost in heap memory, regardless of how much data it contains.

You would do well to switch from daily indices to rollover indices, and not rollover until each shard in the index is over 10G. With a single node, you also probably shouldn't have more than 2 shards per index.

Sorry, I'm far from an ES expert but certainly appreciate the help.

As far as the heap size, I believe we are using 128 gig. Assuming this is what you mean by heap size, we are starting ES with these options:

/usr/bin/java -Xms128g -Xmx128g

How can I tell how many shards there are?

This setup had been working very well up until this recent issue with curator not freeing up space. This has been in place for about a year.

Oh, my. That's definitely outside our recommended best practices. There are many reasons for this, one of which is extremely long garbage collection pauses, which could cause the cluster to stall. This also could be what you're encountering.

The following command will tell you the total number of primary shards:

curl -s http://127.0.0.1:9200/_cat/indices | awk '{sum += $5} END {print sum}'

Substitute your own IP for 127.0.0.1

root@ps-prod-elk:/var/lib/elasticsearch/nodes/0# curl -s http://127.0.0.1:9200/_cat/indices | awk '{sum += $5} END {print sum}'
436

Thanks. The million dollar question is, "What was that number before you ran Curator?"

What is in the Elasticsearch logs of this single node? There are almost certain to be some errors, if space isn't being freed. Do you have any kind of monitoring in place, so we can see what is happening with the heap, and garbage collection?

Unfortunately without a time machine I don't think I have any way to tell what that number was before curator ran.

As far as errors in the logs, I've been watching them all afternoon while I look into this and I haven't really seen any errors. There are informational messages like so:

[2017-11-20T16:04:09,230][WARN ][o.e.c.r.a.DiskThresholdMonitor] [lGLn5hx] high disk watermark [90%] exceeded on [lGLn5hx4TyKKRTRMqT1VDQ][lGLn5hx][/var/lib/elasticsearch/nodes/0] free: 6.9gb[1.1%], shards will be relocated away from this node
[2017-11-20T16:04:09,230][INFO ][o.e.c.r.a.DiskThresholdMonitor] [lGLn5hx] rerouting shards: [high disk watermark exceeded on one or more nodes]
[2017-11-20T16:04:16,531][INFO ][o.e.m.j.JvmGcMonitorService] [lGLn5hx] [gc][3164] overhead, spent [259ms] collecting in the last [1s]

If I stop logstash from running then the messages about garbage collection subside. I'm guessing this means that without logstash running elasticsearch is able to "catch up" but it still doesn't seem to be freeing up any space. I left it like this for a bit with logstash disabled and nothing seemed to change and nothing different was written to the logs.

At this point, I’d consider stopping Logstash and completely restarting Elasticsearch. Start with a clean JVM.

OK, I've just done that. How long do you think I should leave it running like this, without logstash enabled?

I just roughly totaled up the sizes of our indexes and it does seem to match what is being used on disk. Maybe we've just grown the amount of logs we are ingesting and we can no longer keep as much as we used to be able to. I may have to pay closer attention to this moving forward.

Again, I recommend using rollover indices, rather than dailies. It will help reduce the size of the cluster state by not needing to have so many open shards. With only a single node, there's no reason to have 5 shards per index.

I will definitely look into that. Where might I be able to find more info about making the switch to that?

Also, let's say that you thought maybe something was "artificially" bloating the size of your index by either submitting duplicated log entries or data that doesn't belong in ES. How might you find that when you're dealing with a 20 gig index? I don't have any reason at the moment to think that we have that going on yet but it does make me worried that it could be happening.

Good questions! We'd prefer that new questions like that be asked in new topics, to allow the answers to remain relevant to the original topic.

OK, I'll start some new threads then. Thanks for your help!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.