Tips on Optimization

I have just uploaded 4 years of 4GB data into aws elastic cluster.
I have indexed documents by day.
Since, i had stuck with default shard size of 5, now it has grown into more than 7300 primary shards.
I had also stick with the dynamic mapping of es, but i wanted to make my own template now.

I have thought of different options to decrease the shard size,

  1. Reindexing data one index at a time
  • Need to write a script to reindex each one

2.Shrinking index

  • I don't think i can change the mapping of the index.
  • Still, i have to index one by one.
  1. Bulk indexing.
  • I don't think it would be good idea to add a index settings/mapping line to each document record. (There are roughly 2 million documents)
  1. Uploading the data again
    4.a Using Logstash with es in input and output
    4.b Uploading from scratch

That is indeed far too many shards. If you only have 4GB of data, you should be fine using a yearly index with 1 primary shard. It may actually be easier and more efficient to delete it all and index it again from scratch using the correct template rather than trying to reindex that many indices.

Will that be hard for searching or managing data... if we indexed yearly?
Btw, kibana discover has pretty good visualizations with timeperiod.

You might get away with monthly indices as well, but it all depends on your cluster spec and data volume. Having lots of very small shards is very inefficient.

Could we index the data quarterly, since, we are pushing it through logstash.

What is your total data volume per year? How long do you plan to keep the data?

It is about 2-3 gb/year.
We are a growing company, and our growth is more than 20%/year.
We plan to keep the data atleast for 5-10years.
It depends on team demand, but they will expect to cover the data of more than 5 years, mostly for visualizations.

I would recommend going for a yearly index with 1 or 2 primary shard initially. If you realise that shards are getting too big later on you can easily adjust it then and switch to a higher number of primary shards or monthly/quarterly indices.

1 Like

Thank you very much for you advice.
And I don't think, it would be a problem to query the index using wildcard.... if we plan to change the index pattern to month/quarterly later on

Sorry to poke you again, how can i write the index pattern to correspond to quaterly index pattern, in Logstash es ouput....

As long as you have the same prefix that will not be a problem. Am however not sure how to create a quarterly pattern in Logstash.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.