I'm at about 50 windows servers that index weekly (i.e. winlogbeat 2018.34 would be the latest week of the year)
Storage is coming out to be about 10 gigs a week, depending on amount of activity on the servers. This is simply not sustainable, especially considering all of the other systems that are also generating another 10 gigs a week.
My question is - how are people managing their storage?
I'm thinking the biggest problem is that I'm indexing almost 1000 fields for the winlogbeat events. I plan on mitigating this through the logstash config. Is anyone doing something similar? It would be ideal to be able to define 20-30 necessary fields that need to be indexed, and then just dumping the rest of the data into a misc field. Does that sound reasonable?
Best compression is already enabled as for me searching performance is at the bottom of the priorities list. It is odd to me that compression is not better. I have millions of log events that almost exact copies of one another because they are just log in / log out windows events from our authentication management system.
I'm pretty new to the elasticstack. Currently I have a 1 node cluster in my environment as I've been learning. I plan to scale this out once I have storage under control.
You could also store them but make them non-indexable (ie searchable). So you'd then search for something on those 30ish fields but you'd still be able to see the values in the events.
You could also look at the aggregate filter in Logstash to trim things down.
Gotcha.
Is there some guidelines on how to configure fields in the logstash config file? I'm struggling with the documentation. Ideally, I would want to say exclude all except for the following 30 or so fields.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.