Migration from ES1.5.4 TO ES 6.3 HDD performance issue


(Denis Shilov) #1

Hi.

We run our db ES 1.5.4 and decide to move it to 6.3.

Few words about environment:
Everywhere we use same SATA HDD, for each version of ES we use cluster with 2 servers 32Gb RAM.
We tried different RAID settings. And every field at ES 6.3 has own mapping (long/short/keyword, nothing else). We check all best practice advices. But problem still exists.
Also we try single server with 1 node of ES 6.3

Finally, about our issue:
We start parallel writing to both db and get high disk load at ES 6.3.
While at 1.5.4 we have ~800 write ops/10min at 6.3 we got 40 000 write ops/10 min on the same amount of data.
It's really confusing.

For ES 1.5.4 we use Elastica lib 3.1, for ES 6.3 we use Elastica lib 6.0.1

Settings 1.5.4:

 "settings" : {
   "index" : {
     "number_of_replicas" : "1",
     "number_of_shards" : "5",
     "refresh_interval" : "5s",
   }
 }

Settings 6.3:

"settings" : {
  "index" : {
    "refresh_interval" : "5s",
    "number_of_shards" : "5",
    "provided_name" : "male_v2",
    "merge" : {
      "scheduler" : {
        "max_thread_count" : "1"
      }
    },
    "number_of_replicas" : "1",
  }
}

(Christian Dahlqvist) #2

Are you using bulk requests? If so, what bulk size are you using? Are you just indexing new data or are you also updating existing documents? If you are updating, how frequently are you updating individual documents?


(Denis Shilov) #4

Hi, Christian

We didn't use bulk requests at both versions by design of our system. We require asap changes.

We index and update existing documents. At ES 1.5.4 we almost always update docs and it's about 44 millions updates per day AND about few thousands of inserts. At 6.3 amount of updates about 70% and grows up because of we fill db.

At both db always same docs amount processed at writes. At 6.3 no read ops, at 1.5.4 about 3000 per/s.

Best regards.


(Christian Dahlqvist) #5

One change between Elasticsearch 1.x and 6.x that will have an impact here is the fact that the translog is now synced to disk by default per request rather than periodically, especially since you are not using bulk requests. This was done to improve resilience and prevent data loss, but will led to increased I/O.

If you are updating documents that have not yet been published to a segment through refresh, a refresh will be performed. This can result in a lot of small segments and disk I/O if you update the same document frequently.


(Denis Shilov) #6

It's helps, thanks a lot.


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.