A bit of background in our environment, our logstash is installed with elastiflow. 1 cluster of 4 nodes elasticsearch with 2 P and 2 R in 2 zones. The setup is collecting logs from 2 WAN routers with netflow configured.
We find that the disk usage is about 40GB per day in the elasticsearch node. Compression is already set to L4Z. What else can we do to improve the disk usage?
Here are our thoughts.
- Logs from routers to send at 2 minutes interval instead of 30 seconds
- Remove raw data from elastiflow (We did and did not notice any significance)
Can anyone suggest any better ways?
Like removal of indices if they are not in use (But how do we know which indicies and how to remove)?