Elasticsearch 2.4.0 Cluster issues:

elasticsearch-2.4.0 Cluster Server configuration : 4 servers (8 core 16G Memory,800G disk with Ali cloud server)
shards info:
• "number_of_shards": "4",
• "number_of_replicas": "1",
the doc is: 512,977,028 docs,
total data in cluster :1.01TB (why??)

elasticsearch-1.7.1 Cluster Server configuration : 4 servers (8 core 16G Memory,500G disk with Ali cloud server)

shards info:
• "number_of_shards": “5”,
• "number_of_replicas": "1",
the doc is: 520,546,089 docs
total data in cluster :751.11G (why??)

issues:

Data is an application that uses 2 sidekiq to write to the 2 cluster
elasticsearch-2.4.0 Cluster: Compared with elasticsearch-1.7.1 Cluster, found a lot of low performance。Disk read and write very frequently (2 cluster is Same write)

Figure:
refresh_interval =1s
elasticsearch-2.4.0 Cluster with iotop:

Total DISK READ : 23.38 K/s | Total DISK WRITE : 1694.99 K/s 【Average】
Actual DISK READ: 23.38 K/s | Actual DISK WRITE: 4.52 M/s 【Average】

elasticsearch-1.7.1 Cluster with iotop:

Total DISK READ : 66.52 K/s | Total DISK WRITE : 340.43 K/s 【Average】
Actual DISK READ: 66.52 K/s | Actual DISK WRITE: 45.00 K/s 【Average】

elasticsearch-2.4.0 the start:

/usr/local/jdk1.8.0_66/bin/java -Xms12g -Xmx12g -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:+DisableExplicitGC -Dfile.encoding=UTF-8 -Djna.nosys=true -Des.path.home=/data/elasticsearch-node1 -cp /data/elasticsearch-node1/lib/elasticsearch-2.4.0.jar:/data/elasticsearch-node1/lib/* org.elasticsearch.bootstrap.Elasticsearch start -d

elasticsearch-1.7.1 the start:

/usr/bin/java -Xms12g -Xmx12g -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:+DisableExplicitGC -Dfile.encoding=UTF-8 -Delasticsearch -Des.path.home=/es/elasticsearch-node1 -cp :/es/elasticsearch-node1/lib/elasticsearch-1.7.1.jar:/es/elasticsearch-node1/lib/:/es/elasticsearch-node1/lib/sigar/ org.elasticsearch.bootstrap.Elasticsearch

What is the issues? Why is es-2.4.0 written so frequently? (2 cluster is Same write)

from china,thank you!

The translog is written to disk at each operation in 2.X, that may be why.

Are you indexing using bulk requests? Elasticsearch 2.x syncs to disk more frequently compared to Elasticsearch 1.x in order to improve resiliency, which results in more IO, especially if you are not using bulk indexing. The difference in index size may be that doc_values in Elasticsearch 2.x are enabled by default. This results in larger indices on disk, but significantly reduces heap pressure, which is usually a good tradeoff.

1 Like

we don‘t use bulk requests. Until now still have not found the issues. Elasticsearch5.0 can solve this problem?

Use bulk then!

It will help you a lot.

Bulk requests has always been the best way to get optimal indexing performance in Elasticsearch. The difference in performance between individual indexing requests and bulk requests is even more pronounced in Elasticsearch 2.x and 5.x due to the changes I described, so I would highly recommend considering bulk requests.

Thank you very much.