Tips on Optimization

(Aarjan Baskota) #1

I have just uploaded 4 years of 4GB data into aws elastic cluster.
I have indexed documents by day.
Since, i had stuck with default shard size of 5, now it has grown into more than 7300 primary shards.
I had also stick with the dynamic mapping of es, but i wanted to make my own template now.

I have thought of different options to decrease the shard size,

  1. Reindexing data one index at a time
  • Need to write a script to reindex each one

2.Shrinking index

  • I don't think i can change the mapping of the index.
  • Still, i have to index one by one.
  1. Bulk indexing.
  • I don't think it would be good idea to add a index settings/mapping line to each document record. (There are roughly 2 million documents)
  1. Uploading the data again
    4.a Using Logstash with es in input and output
    4.b Uploading from scratch

(Christian Dahlqvist) #2

That is indeed far too many shards. If you only have 4GB of data, you should be fine using a yearly index with 1 primary shard. It may actually be easier and more efficient to delete it all and index it again from scratch using the correct template rather than trying to reindex that many indices.

(Aarjan Baskota) #3

Will that be hard for searching or managing data... if we indexed yearly?
Btw, kibana discover has pretty good visualizations with timeperiod.

(Christian Dahlqvist) #4

You might get away with monthly indices as well, but it all depends on your cluster spec and data volume. Having lots of very small shards is very inefficient.

(Aarjan Baskota) #5

Could we index the data quarterly, since, we are pushing it through logstash.

(Christian Dahlqvist) #6

What is your total data volume per year? How long do you plan to keep the data?

(Aarjan Baskota) #7

It is about 2-3 gb/year.
We are a growing company, and our growth is more than 20%/year.
We plan to keep the data atleast for 5-10years.
It depends on team demand, but they will expect to cover the data of more than 5 years, mostly for visualizations.

(Christian Dahlqvist) #8

I would recommend going for a yearly index with 1 or 2 primary shard initially. If you realise that shards are getting too big later on you can easily adjust it then and switch to a higher number of primary shards or monthly/quarterly indices.

(Aarjan Baskota) #9

Thank you very much for you advice.
And I don't think, it would be a problem to query the index using wildcard.... if we plan to change the index pattern to month/quarterly later on

Sorry to poke you again, how can i write the index pattern to correspond to quaterly index pattern, in Logstash es ouput....

(Christian Dahlqvist) #10

As long as you have the same prefix that will not be a problem. Am however not sure how to create a quarterly pattern in Logstash.

(system) #11

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.