Hourly Time based indices reducing shard segments

pfremm · July 13, 2017, 3:13am

We are using fluentd, elastic, and kibana for logging aggregation. The amount we log has steadily grown so we've switched to hourly time based indices to try and keep our indice size under 50GB as when we did daily indices that often occurred. Search performance as you increase the number of indices search across several days obviously grows and search performance becomes much worse now if we search for something over the past 7 days in kibana and often times out at our 60 second search limit in kibana. Beyond just increasing the timeout I was looking into what we could do to optimize these indices that are no longer being written to and I came across force merging shard segments. For these time based indices it looks like from a random sampling we have at times around 100 segements for a single hourly indice. What would the recommendation before for optimizing these segments? Since they are time based unless we have some networking issue after the hour has passed these indices should be considered static and read-only at that point. Is it truely one segment? The max indice size I've seen now at hourly indices is around 4GB, and at low volume times it can just be a couple hundred MB. How does one properly size segment count for these only indices?

Christian_Dahlqvist · July 13, 2017, 4:27am

I would recommend that you have a look at the rollover API. This allows you to create and switch to a new index when necessary, depending on document count and/or age, rather than just time as in your current setup. Using this you can generate more indices under heavy load and fewer when data volumes drop, which will result in fewer shards of much more similar size.

Forcemerge can as you suggest be very useful for older, read-only indices, but is a quite expensive operation. I am however not sure what the optimal number of segments to target is.

pfremm · July 13, 2017, 2:42pm

Aren't searches across a shard though serialized where each indice can be searched in parallel?

Christian_Dahlqvist · July 17, 2017, 7:57am

They are, but searching lots of small indices can be slower that searching fewer larger ones as there are fewer tasks that need to be queued up and performed.

pfremm · July 28, 2017, 2:16am

What do you consider small indexes? Our hourly indices are normally 4GB.

A little more about the cluster.

We use it for logging. It has nonprod and prod indice prefix with each of them getting hourly new indices. These are usually around 4GB a piece. From casual looking average is probably 1.2 million documents or so but we do have higher peaks.

Any other settings or resources you can recommend for configuration? Right now we have a six node cluster without any dedicated masters. Our search performance is pretty poor though and can take a couple minutes depending on the search.

dadoonet · July 28, 2017, 5:07am

You have to test but one single shard can be 20-50gb.

Christian_Dahlqvist · July 28, 2017, 5:57am

As David pointed out, a good target shard size (not index size) for many logging use cases is often measured in tens of GB. If your largest hourly index is 4 GB (max 96GB), a daily index with 6 primary shards (given that you have 6 nodes) might therefore be sufficient.

pfremm · July 28, 2017, 1:02pm

How about index refresh interval then? I get if it’s the default 1 segment it creates a lot of segments. Is every two minutes to high? Should it be longer or shorter?

Christian_Dahlqvist · July 28, 2017, 1:20pm

That depends on how long you are willing to wait before the data becomes searchable.

system · August 25, 2017, 1:20pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Advice on optimal number of indices Elasticsearch	2	727	August 6, 2021
What is better. Monthly Indices or 1 Index with more shards? Elasticsearch	5	1132	October 17, 2020
Rotate indice daily to hourly Elasticsearch	5	2184	September 25, 2018
How can i reduce amount of segments Elasticsearch	15	1436	July 5, 2017
Compressing and forcemerging past time-based indices Elasticsearch	29	1777	January 7, 2021

Hourly Time based indices reducing shard segments

Related topics