Tips on Optimization

aarjan · October 9, 2017, 9:20am

I have just uploaded 4 years of 4GB data into aws elastic cluster.
I have indexed documents by day.
Since, i had stuck with default shard size of 5, now it has grown into more than 7300 primary shards.
I had also stick with the dynamic mapping of es, but i wanted to make my own template now.

I have thought of different options to decrease the shard size,

Reindexing data one index at a time

Need to write a script to reindex each one

2.Shrinking index

I don't think i can change the mapping of the index.
Still, i have to index one by one.

Bulk indexing.

I don't think it would be good idea to add a index settings/mapping line to each document record. (There are roughly 2 million documents)

Uploading the data again
4.a Using Logstash with es in input and output
4.b Uploading from scratch

Christian_Dahlqvist · October 9, 2017, 9:33am

That is indeed far too many shards. If you only have 4GB of data, you should be fine using a yearly index with 1 primary shard. It may actually be easier and more efficient to delete it all and index it again from scratch using the correct template rather than trying to reindex that many indices.

aarjan · October 9, 2017, 9:39am

Will that be hard for searching or managing data... if we indexed yearly?
Btw, kibana discover has pretty good visualizations with timeperiod.

Christian_Dahlqvist · October 9, 2017, 9:41am

You might get away with monthly indices as well, but it all depends on your cluster spec and data volume. Having lots of very small shards is very inefficient.

aarjan · October 9, 2017, 9:47am

Could we index the data quarterly, since, we are pushing it through logstash.

Christian_Dahlqvist · October 9, 2017, 9:48am

What is your total data volume per year? How long do you plan to keep the data?

aarjan · October 9, 2017, 9:52am

It is about 2-3 gb/year.
We are a growing company, and our growth is more than 20%/year.
We plan to keep the data atleast for 5-10years.
It depends on team demand, but they will expect to cover the data of more than 5 years, mostly for visualizations.

Christian_Dahlqvist · October 9, 2017, 9:57am

I would recommend going for a yearly index with 1 or 2 primary shard initially. If you realise that shards are getting too big later on you can easily adjust it then and switch to a higher number of primary shards or monthly/quarterly indices.

aarjan · October 9, 2017, 10:02am

Thank you very much for you advice.
And I don't think, it would be a problem to query the index using wildcard.... if we plan to change the index pattern to month/quarterly later on

Sorry to poke you again, how can i write the index pattern to correspond to quaterly index pattern, in Logstash es ouput....

Christian_Dahlqvist · October 9, 2017, 10:04am

As long as you have the same prefix that will not be a problem. Am however not sure how to create a quarterly pattern in Logstash.

system · November 6, 2017, 10:05am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Performance degrading after a couple of weeks Elasticsearch	7	526	October 30, 2018
Dealing with large index collection strategy? Elasticsearch	6	1559	July 5, 2017
Need advice on shards for my index Elasticsearch	15	944	September 30, 2020
Assistance regarding optimizing an Elasticsearch cluster for analytics Elasticsearch	9	1537	July 5, 2017
ES for logging - what to look after with high indexing rate Elasticsearch	8	1940	September 27, 2017

Tips on Optimization

Related topics