Help to increase number of shards on active index running close to 2B records

saiguru · July 2, 2020, 12:50am

I have 3 master and 2 data node with each 1TB cluster.
I build a production index and it is reaching limit of 2B, how to increase # of shards to 10 on running index.

ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
10.21.17.8 81 99 52 1.92 1.52 1.42 dil - elk-data-vm3
10.21.17.5 26 66 0 0.00 0.01 0.00 ilm - elk-master-vm2
10.21.17.4 69 99 18 1.26 1.00 0.98 dil - elk-data-vm4
10.21.17.6 25 48 0 0.00 0.00 0.00 ilm - elk-master-vm0
10.21.17.7 35 64 1 0.00 0.02 0.00 ilm * elk-master-vm1

epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1593650988 00:49:48 aiopselkcluster red 5 2 71 36 0 0 19 0 - 78.9%

yellow open insight_tapm_prod_v1 dEEH71UwQe6enExvoKkpCw 1 2 1912941427 0 156gb 78gb

AClerk · July 2, 2020, 2:53am

AFAIK
You can create an index template with the desired settings.
Then re-index.

But it sounds a bit odd that you have such a large index.
Maybe consider re-designing your indexing solution?

Thanks

saiguru · July 2, 2020, 4:21am

thanks for quick response, we are did not use template while creating index.

We are working on large machine learning logs and data sets, is there a recommended pattern for template and index?

Christian_Dahlqvist · July 2, 2020, 5:05am

Which version of Elasticsearch are you using? Are you just inserting new data or updating and deleting as well?

saiguru · July 2, 2020, 2:11pm

7.6.2
inserting new records every 5 min as we process data with ML models.

Quick help needed is -- to increase shards on existing index, and i tried -- make index and read only and increase shards.
Would like to know if i am making any mistake on above steps.

defalt · July 2, 2020, 2:24pm

With this changing shard size you can. Just kidding.
You will have to split your index into a new one with more shards. First of all you will have to change the number_of_routing_shards setting to something like 25. After that set the index to read only and perform the split.

POST /index/_split/split-index
{
  "settings": {
    "index.number_of_shards": 20
  }
}

Cheers,
yoda.

saiguru · July 2, 2020, 2:41pm

to set index as read only -- is below command right?
PUT //_settings
{
"index": {
"blocks.read_only": true
blocks.read_only_allow_delete": true
}
}

defalt · July 2, 2020, 2:44pm

Yes, looks right to me. But be aware that you can't index anything into the index if you set it into read only (as the name suggests). You will have a downtime of a day or more if you havent set up counter measures. Spliting the index will take its time, becuase your index is huge.

saiguru · July 2, 2020, 2:48pm

oops good point, i can't do this real-time. thanks for pointing out.
What is recommended action -
option1: Create new index with more shards and push data there
option2: downtime to create splits

any more options?

defalt · July 2, 2020, 2:53pm

My idea would be:

Create a new index with more shards and the right mapping.
Index new data into this new index.
_reindexthe old index into the new one.

I am not 100% sure this works so I would wait for @Christian_Dahlqvist, @dadoonet or another Elasticsearch Team member to verify these steps. You don't want to riskt a data loss.

system · July 30, 2020, 2:53pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.