Indices recovery speed is strange


(Leo Niu) #1

Hi, this situation happens when I am testing the elatsicsearch recovery speed. I use elasticsearch 5.3.0 as a one node cluster and a heap size of 3G. Indices are like this:

green  open   fulleth_smartprobe2_nic0_4h_2018010304 As0aPKxlScKpWFgnY30ZTQ   1   0     600000            0    131.6mb        131.6mb
green  open   fulleth_smartprobe3_nic0_4h_2018010116 SPCef_8IQd27dD2C-SeixQ   1   0     315000            0     69.4mb         69.4mb
green  open   fulleth_smartprobe0_nic0_4h_2018010416 60HsxuA_TYqu9PUqqcrKqw   1   0     600000            0    131.5mb        131.5mb
green  open   fulleth_smartprobe3_nic0_4h_2018010400 TiWk16DpRZSDRV7a8gxaoA   1   0     600000            0    130.6mb        130.6mb
green  open   fulleth_smartprobe0_nic0_4h_2018010304 hRavNJEbQJuDbRqSo4QOOA   1   0     600000            0    132.1mb        132.1mb
green  open   fulleth_smartprobe1_nic0_4h_2018010400 AFZNjKPpRNKwPGyDVyW6aQ   1   0     600000            0    131.6mb        131.6mb
green  open   fulleth_smartprobe3_nic0_4h_2018010100 lCNHIt_2RE2-ghJcLpnl1A   1   0     600000            0    132.2mb        132.2mb
green  open   fulleth_smartprobe1_nic0_4h_2018010120 hMunDg7TTraEeasTZbIMIg   1   0     600000            0    131.6mb        131.6mb
green  open   fulleth_smartprobe1_nic0_4h_2018010304 ux-3YlAdRK69d8i7LubROA   1   0     600000            0    131.7mb        131.7mb
green  open   fulleth_smartprobe2_nic0_4h_2018010208 0PzF_5sRRjaB4v1jDsxraQ   1   0     600000            0    131.7mb        131.7mb

I totally have 93 indices, most of which have the same size (for testing). Other settings are default settings. The first time I use systemctl restart elasticsearch to restart the cluster, it will cost about 120secs. Then I clear use echo 3 > /proc/sys/vm/drop_caches to clear all linux caches and restart the cluster again. This time it only costs about 25s.

After that, I modify indices.recovery.max_bytes_per_sec, change the value to 120MB, or 2 MB, but it still costs about 25s to finsh recovery.

So there are two questions, why the first restart is much more slower than others? Why the indices.recovery.max_bytes_per_sec dose not work? Thanks.

{
  "acknowledged" : true,
  "persistent" : {
    "indices" : {
      "recovery" : {
        "max_bytes_per_sec" : "120mb"
      }
    }
  },
  "transient" : { }
}

(Christian Dahlqvist) #2

Are you using hourly indices? It looks like you have a lot of very small indices and shards, which can be very inefficient and result in a large cluster state. Read this blog post around shards and sharding for some guidance.


(Leo Niu) #3

Thanks for the reply.
I use small indices and shards because it is easier to test the recovery speed. I am using the suggested sharding strategy(less than 500 shards, each 20-40 GB size for a 32GB heap) In production. This cluster is made for testing the recovery speed, so I create it with small size to test the recovery.


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.