Struggling with storage management

seanva · August 24, 2018, 10:05pm

I'm at about 50 windows servers that index weekly (i.e. winlogbeat 2018.34 would be the latest week of the year)
Storage is coming out to be about 10 gigs a week, depending on amount of activity on the servers. This is simply not sustainable, especially considering all of the other systems that are also generating another 10 gigs a week.

My question is - how are people managing their storage?
I'm thinking the biggest problem is that I'm indexing almost 1000 fields for the winlogbeat events. I plan on mitigating this through the logstash config. Is anyone doing something similar? It would be ideal to be able to define 20-30 necessary fields that need to be indexed, and then just dumping the rest of the data into a misc field. Does that sound reasonable?

Best compression is already enabled as for me searching performance is at the bottom of the priorities list. It is odd to me that compression is not better. I have millions of log events that almost exact copies of one another because they are just log in / log out windows events from our authentication management system.

I'm pretty new to the elasticstack. Currently I have a 1 node cluster in my environment as I've been learning. I plan to scale this out once I have storage under control.

Thanks. Any advice is appreciated.

warkolm · August 27, 2018, 12:43am

You could also store them but make them non-indexable (ie searchable). So you'd then search for something on those 30ish fields but you'd still be able to see the values in the events.

You could also look at the aggregate filter in Logstash to trim things down.

seanva · August 27, 2018, 3:00pm

Could I test the storage savings of not indexing all of the fields? I.E would closing an index also make all of its fields non-searchable?

warkolm · August 27, 2018, 7:33pm

It makes the entire index non-searchable.

seanva · August 27, 2018, 7:40pm

Right, but would the storage reflect that? I'm trying to figure out of it's even worth the time cleaning these fields up.

warkolm · August 27, 2018, 7:45pm

It will still exist on disk, you just won't be able to see it via the APIs until you open it back up.

Christian_Dahlqvist · August 27, 2018, 7:57pm

Which version of Elasticsearch are you using?

seanva · August 27, 2018, 8:08pm

Gotcha.
Is there some guidelines on how to configure fields in the logstash config file? I'm struggling with the documentation. Ideally, I would want to say exclude all except for the following 30 or so fields.

Thanks, appreciate the help.

seanva · August 28, 2018, 3:38pm

6.3.2

Christian_Dahlqvist · August 28, 2018, 4:04pm

It is great to hear that you are on a recent version, as a lot ofimprovements around the ability to handle large number of sparse fields have been added lately. If you were on a version prior to 6.0 this would have had a much more significant impact.

seanva · August 30, 2018, 5:02pm

Hey Mark,
Just wanted to follow up and see if there's an easy way for me to map the explicit fields I want to include for indexing.

Thanks

warkolm · August 30, 2018, 7:59pm

You need to do that in a template, it's not a Logstash feature.

seanva · September 5, 2018, 2:50pm

Mark,
I'm considering this:

Do you have any experience with this? Opinions?

EDIT: is this a paid gold or platinum feature?

Thanks

system · October 3, 2018, 2:50pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How is log storage working? Logstash	2	977	July 6, 2017
Weird storage change Elasticsearch	4	628	July 20, 2017
ELK cluster disk space usage optimization Elasticsearch	9	2554	July 5, 2017
Disk space filled up with ES indices Elasticsearch	7	5484	July 6, 2017
Reducing Disk Space Requirements/ Deduplication? Zipping? Elasticsearch	5	2334	July 6, 2017

Struggling with storage management

Related topics