Restore from s3_repository stops part way through


(Martin Stevens) #1

Hi,

We are trying to recover from an s3_repository however some of the indexes never get past a certain point, we install es with ansible and then packer the ami, then spin up a new box with a larger disk.

The restore process always jams up on the same indexes a the same point.

1st Run on new ec2 instance

1st Run

news        0 4.3s  snapshot done  n/a n/a 172.31.46.121 YIuL9Ew s3_repository full_snapshot_20181002_121100 39  39  100.0% 39  2491209    2491209    100.0% 2491209    0 0 100.0%
news        1 3.6s  snapshot done  n/a n/a 172.31.46.121 YIuL9Ew s3_repository full_snapshot_20181002_121100 36  36  100.0% 36  2566507    2566507    100.0% 2566507    0 0 100.0%
news        2 3.9s  snapshot done  n/a n/a 172.31.46.121 YIuL9Ew s3_repository full_snapshot_20181002_121100 41  41  100.0% 41  2533518    2533518    100.0% 2533518    0 0 100.0%
news        3 3.2s  snapshot done  n/a n/a 172.31.46.121 YIuL9Ew s3_repository full_snapshot_20181002_121100 24  24  100.0% 24  2359775    2359775    100.0% 2359775    0 0 100.0%
news        4 4.2s  snapshot done  n/a n/a 172.31.46.121 YIuL9Ew s3_repository full_snapshot_20181002_121100 38  38  100.0% 38  2359664    2359664    100.0% 2359664    0 0 100.0%
transcripts 0 51.8s snapshot done  n/a n/a 172.31.46.121 YIuL9Ew s3_repository full_snapshot_20181002_121100 72  72  100.0% 72  1140938490 1140938490 100.0% 1140938490 0 0 100.0%
transcripts 1 52.3s snapshot done  n/a n/a 172.31.46.121 YIuL9Ew s3_repository full_snapshot_20181002_121100 83  83  100.0% 83  1145127154 1145127154 100.0% 1145127154 0 0 100.0%
transcripts 2 1m    snapshot done  n/a n/a 172.31.46.121 YIuL9Ew s3_repository full_snapshot_20181002_121100 89  89  100.0% 89  1142664174 1142664174 100.0% 1142664174 0 0 100.0%
transcripts 3 53.3s snapshot done  n/a n/a 172.31.46.121 YIuL9Ew s3_repository full_snapshot_20181002_121100 75  75  100.0% 75  1156088690 1156088690 100.0% 1156088690 0 0 100.0%
transcripts 4 4m    snapshot index n/a n/a 172.31.46.121 YIuL9Ew s3_repository full_snapshot_20181002_121100 87  7   8.0%   87  1132775561 653857166  57.7%  1132775561 0 0 100.0%
edgar       0 11.9m snapshot index n/a n/a 172.31.46.121 YIuL9Ew s3_repository full_snapshot_20181002_121100 167 23  13.8%  167 2270559236 580570940  25.6%  2270559236 0 0 100.0%
edgar       1 11.9m snapshot index n/a n/a 172.31.46.121 YIuL9Ew s3_repository full_snapshot_20181002_121100 89  48  53.9%  89  2285799498 153191569  6.7%   2285799498 0 0 100.0%
edgar       2 11.9m snapshot index n/a n/a 172.31.46.121 YIuL9Ew s3_repository full_snapshot_20181002_121100 190 2   1.1%   190 2315607352 128042801  5.5%   2315607352 0 0 100.0%
edgar       3 2.3m  snapshot done  n/a n/a 172.31.46.121 YIuL9Ew s3_repository full_snapshot_20181002_121100 184 184 100.0% 184 2295838134 2295838134 100.0% 2295838134 0 0 100.0%
edgar       4 1.7m  snapshot done  n/a n/a 172.31.46.121 YIuL9Ew s3_repository full_snapshot_20181002_121100 158 158 100.0% 158 2278318029 2278318029 100.0% 2278318029 0 0 100.0%

2nd run on another ec2 instance

news  0 5.1s  snapshot done  n/a n/a 172.31.44.158 YIuL9Ew s3_repository full_snapshot_20181002_121100 39  39 100.0% 39  2491209    2491209   100.0% 2491209    0 0 100.0%
news  1 4.9s  snapshot done  n/a n/a 172.31.44.158 YIuL9Ew s3_repository full_snapshot_20181002_121100 36  36 100.0% 36  2566507    2566507   100.0% 2566507    0 0 100.0%
news  2 5.2s  snapshot done  n/a n/a 172.31.44.158 YIuL9Ew s3_repository full_snapshot_20181002_121100 41  41 100.0% 41  2533518    2533518   100.0% 2533518    0 0 100.0%
news  3 3.9s  snapshot done  n/a n/a 172.31.44.158 YIuL9Ew s3_repository full_snapshot_20181002_121100 24  24 100.0% 24  2359775    2359775   100.0% 2359775    0 0 100.0%
news  4 3.4s  snapshot done  n/a n/a 172.31.44.158 YIuL9Ew s3_repository full_snapshot_20181002_121100 38  38 100.0% 38  2359664    2359664   100.0% 2359664    0 0 100.0%
edgar 0 16.7m snapshot index n/a n/a 172.31.44.158 YIuL9Ew s3_repository full_snapshot_20181002_121100 167 21 12.6%  167 2270559236 418857037 18.4%  2270559236 0 0 100.0%
edgar 1 16.7m snapshot index n/a n/a 172.31.44.158 YIuL9Ew s3_repository full_snapshot_20181002_121100 89  48 53.9%  89  2285799498 994567201 43.5%  2285799498 0 0 100.0%
edgar 2 16.7m snapshot index n/a n/a 172.31.44.158 YIuL9Ew s3_repository full_snapshot_20181002_121100 190 2  1.1%   190 2315607352 145929225 6.3%   2315607352 0 0 100.0%
edgar 3 16.7m snapshot index n/a n/a 172.31.44.158 YIuL9Ew s3_repository full_snapshot_20181002_121100 184 8  4.3%   184 2295838134 142034298 6.2%   2295838134 0 0 100.0%

We seem to have enough disk space.

shards disk.indices disk.used disk.avail disk.total disk.percent host          ip            node
     9        1.5gb     4.4gb    111.8gb    116.2gb            3 172.31.44.158 172.31.44.158 YIuL9Ew
    21                                                                                       UNASSIGNED
Filesystem      Size  Used Avail Use% Mounted on
udev            2.0G     0  2.0G   0% /dev
tmpfs           395M  712K  394M   1% /run
/dev/xvda1      117G  4.5G  112G   4% /
And the snashot looks good.

   {
      "snapshot" : "full_snapshot_20181002_121100",
      "uuid" : "IA74nkvtQUGQzGzUUAGoBw",
      "version_id" : 6020499,
      "version" : "6.2.4",
      "indices" : [
        "transcripts",
        "news",
        "edgar"
      ],
      "include_global_state" : false,
      "state" : "SUCCESS",
      "start_time" : "2018-10-02T11:12:21.764Z",
      "start_time_in_millis" : 1538478741764,
      "end_time" : "2018-10-02T11:12:25.460Z",
      "end_time_in_millis" : 1538478745460,
      "duration_in_millis" : 3696,
      "failures" : [ ],
      "shards" : {
        "total" : 15,
        "failed" : 0,
        "successful" : 15
      }
    }

(Yannick Welsch) #2

Have you looked at the logs on the individual nodes? Can you check hot_threads to see if something is getting stuck on the nodes?


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.