How to minimize the reindexing time

Using ELK 7.13 cluster 3 master 6 data nodes and 2 coordinate nodes
Master node configuration

  1. Heap Size: 8gb
  2. Storage: 50gb(ssd)
  3. CPU: 4

Data node configuration

  1. Heap Size: 28gb
  2. Storage: 1tb(ssd)
  3. CPU: 8

Coordinate node configuration

  1. Heap Size: 8gb
  2. Storage: 50gb(ssd)
  3. CPU: 4
health status index                            uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   log-wlb-sysmon-2021.10.01-000014 OaDubXBDTFuwVqYjLIIcXg   1   1   87312747            0    114.4gb         57.2gb
green  open   log-wlb-sysmon-2021.09.30-000013 _ukLur8LR0aY4pUJPnGIcw   1   1   85912426            0    113.9gb         56.9gb
green  open   log-wlb-sysmon-2021.08.16-000015 EtF6odgpQGGRjQ0Nywqtxg   1   1  102154111            0    102.2gb         51.1gb
green  open   log-wlb-sysmon-2021.10.02-000015 sc3bQuCeQEmABwrMMjh8Gw   1   0    1280767            0    592.9mb        592.9mb
green  open   log-wlb-sysmon-2021.09.23-000019 tevWknvdRY-XwrEARoNY9g   1   1   44918363       166891     53.9gb         26.9gb
green  open   log-wlb-sysmon-2021.08.27-000017 hAGeCkqhRlCBvmwBLonjaQ   1   1   93303600            0      102gb           51gb
green  open   log-wlb-sysmon-2021.08.19-000016 mkBAnhNMReyHzw0PJwkCHg   1   1   83527835            0      102gb           51gb
green  open   log-wlb-sysmon-2021.09.07-000018 HuDxDct7RGKOcuYFXCKA8w   1   1   92614553            0    101.7gb         50.8gb

I want to reindexing these indices.log-wlb-sysmon-2021.09.30-000013,log-wlb-sysmon-2021.10.01-000014 have been reindexed and currently i am reindexing log-wlb-sysmon-2021.10.02-000015 index.
You can see that log-wlb-sysmon-2021.09.30-000013,log-wlb-sysmon-2021.10.01-000014 has taken 2 days to reindex.

Although i close the replica and stop the refresh process and then start the reindexing process

PUT %3Clog-wlb-sysmon-%7Bnow%2Fd%7D-000015%3E 
{
"aliases": {
   "log-wlb-sysmon": {
      "is_write_index": false      
   }
 },
 "settings": {
   "index.lifecycle.parse_origination_date":true,
  "refresh_interval": "-1"
   , "number_of_replicas": 0
 }
}

Reindexing million of documents takes a day and then enabling replica and force merge takes 3 to 4 hours.
But the question is i want to reindex this millions of documents in just 2-3 hours or how can i reduce this time consumption while running reindexing api.

Are you using slices to reindex ?

No,I am not using slices option in reindex.

You should.

POST _reindex
{
  "source": {
    "index": "my-index-000001",
    "slice": {
      "id": 1,
      "max": 2
    }
  },
  "dest": {
    "index": "my-new-index-000001"
  }
}

What is id,max because i am gonna use this first time in my reindex

I'd just use something like:

POST _reindex?slices=5
{
  "source": {
    "index": "my-index-000001"
  },
  "dest": {
    "index": "my-new-index-000001"
  }
}

And see if this improves things.

POST _reindex?slices=5
{
  "source": {
    "index": "my-index-000001"
    "size": 8000
  },
  "dest": {
    "index": "my-new-index-000001"
  }
}

Can i use size parameter in source.Will it work?

You can but don't overload your nodes.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.