Reach limits of an index to create a new one

(makovskij) #1

Like in the answer from @tylerjl in my previous post (link) we'd like to arrange the costumer with an filtered aliases.
Does someone has experience with a filtered alias against so many and big datas and went into problems ?

To avoid to create unnecessary shards we would like make some kind of requests on the current index with only one or a few shards and if a limit is reached, create a new one. (for example put all in the index "allcostumers_2015_05-1" and if the limit is reached create an index "allcostumers_2015_05-2" , 2015 for the year and 05 for the month)
The Questions are how to set a limit?
By Size and the number documents ?
What would be a good limit e.g. 100 Million documents or 50 GB for a shard?
Does somebody created something like this and would share their experience ?

Some corresponding informations to our documents and system:

We will have about 216 millionen documents with lets say about 1,6 TB for this year.

We have about 1000 Costumer and we'd like to use an index per month.

3 dedicated master. 3 clients for which import and 2 clients for searches .
9 Data nodes with 8 cores, 8-16 Gb Ram and 1 TB each.


(Mark Walkom) #2

We recommend keeping shards under 50GB and 2 billion docs, the latter being a hard lucene limit.

Having only a single shard per index doesn't really help concurrency though and this approach seems like you are trying to solve a problem that isn't an actual problem.

(makovskij) #3

That means:
1600 GB / 12 months = 133 GB

minimum 3 shards (=> 150 GB)

Would 5 shards (250GB) be a good number for a month?

And back to the topic and the idea to create a new index if the limit is reached. For example our Data will raise twice or even more and the Data will be more than 250 GB per month.
Is it a solution to ask elasticsearch if the index size is reached? Does someone build a mechanism like that? What are the experience?

(system) #4