Disk usage/shard allocation problems during snapshot creation

Russell_Fulton · September 12, 2023, 12:27am

version 7.17.12

last night my cluster stoped ingesting data. One node ran out of disk after snapshot started. That node normally has plenty of headroom:

57%
available: 1.85TB
total: 4.30TB

logs show:

[2023-09-12T00:30:00,182][INFO ][o.e.s.SnapshotsService   ] [secesprd01] snapshot [new-daily:daily-2023.09.11-w21i9jn9qsifim7l65vc0a/zHxRzk2QTPivX6-y08d0mg] started
[2023-09-12T00:30:39,725][INFO ][o.e.c.r.a.DiskThresholdMonitor] [secesprd01] low disk watermark [85%] no longer exceeded on [DsJqLibJQSi9D2lIAUHOrw][secesprd09][/data/elasticsearch/security/nodes/0] free: 534gb[
19%]
[2023-09-12T00:30:39,737][WARN ][o.e.c.r.a.d.DiskThresholdDecider] [secesprd01] after allocating [[arkime_sessions3-230905][0], node[6UDagJW2T3eWM-0PQJ0rMA], [P], s[STARTED], a[id=wO2cjVlVQvK-HZoTFtMTtw]] node [D
sJqLibJQSi9D2lIAUHOrw] would have more than the allowed 10% free disk threshold (5.3% free), preventing allocation
[2023-09-12T00:30:39,737][WARN ][o.e.c.r.a.d.DiskThresholdDecider] [secesprd01] after allocating [[arkime_sessions3-230911][1], node[6UDagJW2T3eWM-0PQJ0rMA], [P], s[STARTED], a[id=lYe1STNvQpmDkYTQ_UZSDg]] node [D
sJqLibJQSi9D2lIAUHOrw] would have more than the allowed 10% free disk threshold (3.8% free), preventing allocation
.......
[2023-09-12T00:56:10,201][WARN ][o.e.c.r.a.d.DiskThresholdDecider] [secesprd01] after allocating [[arkime_sessions3-230910][1], node[kAWPcpoxSNSN9WlUsYlQlg], [P], s[STARTED], a[id=tzbQK9OFS7OBr2csxLeC2g]] node [DsJqLibJQSi9D2lIAUHOrw] would have less than the required threshold of 0b free (currently 422.1gb free, estimated shard size is 789.2gb), preventing allocation

Then no more allocation errors and the snapshot finished hours after the disk problem went away. So it seems unlikely that the problem is related to the snapshot.

I have moved the mount point of the backup dir out of the 'data path' as a precaution I did check that the backup mount had failed (as it occasionally does) but it looked good.

The data path has partition to itself. Nothing else should be writing into it.

Any ideas what happened?

Russell_Fulton · September 12, 2023, 1:19am

I now think I know what happened. We have a series of indexes that roll every day and we were keeping 7 days of data. These indexes store network flow data and were taking up about 7-800 GB/day. Over the weekend there were some changes in the network setup and the indexes are now tracking at just over 1TB per day.

At midnight the index rolls over before the old index got deleted. The (unexpected) 30% increase in size of those indexes pushed us over the edge last night. As soon as the old index was deleted everything was fine.

The lifecycle policy has been changed an one old index has been deleted so hopefully all will be well tonight!

DavidTurner · September 12, 2023, 5:35am

Is that estimate accurate? That's pretty big for a single shard. In particular, the allocator kind of assumes/requires that there's room for a shard in the gap above each watermark (i.e. between low and high, between high and flood-stage, and between flood-stage and disk-full). Maybe it'd help to have more/smaller shards here.

Russell_Fulton · September 12, 2023, 9:04pm

Thanks David (as always : )

yes, the shard is large. I was working on the assumption of one shard per eligible node. I have 3 hot nodes hence 3 shards. Should I double that? I did notice the shard size recommendation when I was researching the problem. One of those nodes is new and has less disk and hence smaller overhead. I will increase the disk allocation on that machine. I was just looking at the amount of free space on the other hot nodes.

I have also turned of force_merge on warm as that (I assume) requires the shard to be duplicated

Sunile_Manjee · September 16, 2023, 4:51am

Multiple shards per node is normal. Generally we try to avoid primary shards for a given index being located on the same node for several reason, one being ingest load distribution (high scale use cases). Keeping shard sizes around is 50gb is the best practice.

system · October 14, 2023, 4:51am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Single Elasticsearch node abruptly ran out of disk space (possibly due to a giant temp file) Elasticsearch	6	1165	July 7, 2023
Understanding Disk-based Shard Allocation better Elasticsearch	11	1220	March 11, 2019
Total_shards_per_node and disk usage too high causes shards to stay unallocated Elasticsearch	7	42	December 23, 2024
Shard reallocation and disk space Elasticsearch	5	804	August 4, 2020
Elasticseach shards allocation Elasticsearch	3	418	December 12, 2022

Disk usage/shard allocation problems during snapshot creation

Related topics