We have a capacity problem with our worker nodes (ELK stack)
oot@xxx03:/elkdata# yum list | grep elasticsearch | grep \.12
apm-server.i686 7.12.0-1 elasticsearch-7.x
apm-server.x86_64 7.12.0-1 elasticsearch-7.x
auditbeat.i686 7.12.0-1 elasticsearch-7.x
auditbeat.x86_64 7.12.0-1 elasticsearch-7.x
elastic-agent.i686 7.12.0-1 elasticsearch-7.x
elastic-agent.x86_64 7.12.0-1 elasticsearch-7.x
elasticsearch.x86_64 7.12.0-1 elasticsearch-7.x
enterprise-search.noarch 7.12.0-1 elasticsearch-7.x
filebeat.i686 7.12.0-1 elasticsearch-7.x
filebeat.x86_64 7.12.0-1 elasticsearch-7.x
heartbeat-elastic.i686 7.12.0-1 elasticsearch-7.x
heartbeat-elastic.x86_64 7.12.0-1 elasticsearch-7.x
journalbeat.i686 7.12.0-1 elasticsearch-7.x
journalbeat.x86_64 7.12.0-1 elasticsearch-7.x
kibana.x86_64 7.12.0-1 elasticsearch-7.x
logstash.x86_64 1:7.12.0-1 elasticsearch-7.x
metricbeat.i686 7.12.0-1 elasticsearch-7.x
metricbeat.x86_64 7.12.0-1 elasticsearch-7.x
packetbeat.i686 7.12.0-1 elasticsearch-7.x
packetbeat.x86_64 7.12.0-1 elasticsearch-7.x
This has been installed on worker node #2 und worker node #3, in the same way. Node #1 ist the master. Everything works fine. So far.
Since we added some new and huge loggings from a new customer request, the current volumes
/deb/sdb1 /elkdata
is running out of space in the near future. The volumes are the same on both nodes, not mirrored and - unfortunately - not based on a LVM, which would make expandibility pretty much easy in this case.
Currently, we got 500 GB for each vol, but we need at least 1 TB in the near future.
Question now for rescueing is, would it be possible to work with this scenario as described as follows:
Shutdown down complete application on cluster on node #2 (elk stack would be down then - master node #1 and work node #3 would be running furthermore without any interruption)
After shutdown of node #2, a new LVM would be added (made by vmware Team by adding a 1 TB vol, which would be mounted then as /elkdat2 and prepared with ext4)
A raw copy from all (now "sleeping") data on node #2 would be initiated to fill up newly created LVM vol called /elkdat2.
After that, umounting of /elkdata and /elkdat2 would be next step, followed by remounting in the opposite way:
/dev/lvm/elkdat2 -> /elkdata
/dev/sdb1 -> /elkdata.old
Then restarting all ELK components to re-run all services. The assumed portion of data would be then availabe as it was before shutting down this cluster part.
Final step, re-indexing of partially lost indices from the untouched worker node towards the one with the new disk.
Is that a scenario that had been tried ever before? Is it possible to work with that? Or are there too many / too much risks?
Or are there other ways for me solve this?
Thanks very much.
Moritz