Single huge index v/s daily or weekly index, which is better?

nethis · January 31, 2017, 12:52pm

Hi There,

We have an indexing which is growing around 500 TB per week.

Currently, we have the size of 2 TB and have the 3 replicas, which is taking around 20-30 mins for indexing a 750 MB document. And lot of files to upload piled up and unable to catchup.

We have 10 node cluster (Windows Azure VMS)with 4 data, 3 master and 3 client. Data Nodes of size 56 GB RAM and 8 Cores.

What we really want to find out is, will be the daily,weekly, monthly indexes is the better option than a single huge index?

If have smaller indexes, will maintaining the indexes will be an issue in the longer period? If yes, what sort of challenges can we expect. ?

JKhondhu · January 31, 2017, 2:30pm

How many primary shards along with these three replicas?
How large would one days index be - Have you tested the indexing/searching in this capacity?

Christian_Dahlqvist · January 31, 2017, 2:46pm

Can you tell us a bit about the use case? Is it read and/or write heavy? Do you update documents or are documents generally immutable?

nethis · January 31, 2017, 6:07pm

@JKhondhu We have 20 Primary Shards, 1 days index will be around 5 to 8GB, and will gradually increase, around 10% every week or so.

We have not yet tried indexing the day level indexing, yet. Just want to know, the benefits and complications with it before we try.

nethis · January 31, 2017, 6:11pm

@Christian_Dahlqvist Our index will be write heavy i.e around 5-8GB per day for each index, and read heavy too. and the documents are immutable.

Will load the data once day, but currently it is running all time, as the indexing is pretty slow. Its a platform which will be used by 100 members atleast. but may not be concurrent.

Christian_Dahlqvist · January 31, 2017, 7:15pm

I answered your other question and think you will benefit from switching to time-based indices. This allows a smaller set of indices to be targeted if you are only looking at data within a limited time frame.

The ideal time period an index should cover varies by use case. Adjust the number of primary shard based on the number of nodes in the cluster (to spread data out) as well as volume indexed per day. Make sure you do not end up with too small or too large shards. Having large number of very small shards is inefficient as each shard has some overhead and too large shards can affect query performance as well as recovery.

system · February 28, 2017, 7:15pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elastic search non-responsive under load after topology change Elasticsearch	5	376	December 28, 2018
Should i create daily index or weekly? Elasticsearch	2	909	July 24, 2019
Is Daily-Index better than Monthly-Index Elasticsearch	6	1945	May 26, 2020
When do you need more then 1 shard? Elasticsearch	12	1853	July 6, 2017
One large index vs. many smaller indexes Elasticsearch	5	10617	July 6, 2017

Single huge index v/s daily or weekly index, which is better?

Related topics