I have ES 1.7.5 cluster with 20 servers. Maximum indexing performance of this cluster is about 6000-7000 docs/sec. Average size of document is about 1.5 Kbytes
First (optional question what do you think about this speed?
And for upgrading to ES 2.3 I setup cluster with another 20 servers and run ES 2.3 but maximum performance of this cluster is about 3000-3500 docs/sec. How to improve speed?
Configurations of clusters are almost default except all strings are not_analyzed. And any server may become master.
What type of hardware are you using? What is the specification of the nodes? How are you ingesting data? What does load on the servers look like while you are indexing? How many indices/shards are you actively indexing into?
Sorry - I see now 1.5 KB avg size. It means 9-10 MB/sec. This is very bad performance for 20 servers, it would mean one server could take only 500 KB/sec.
This is probably the cause. ES 2.x default for translog changed from async to sync. If you set the following in your ES 2.3 cluster, do you get better performance?
Let's take caution here. First, when making any change to the index.translog.durability, it's immensely important to point out that the tradeoff is a loss of safety. Second, the performance reported here is so incredibly low that I'm skeptical that the best solution to getting back some of the performance is by adjusting the translog sync. I suspect that performance is being left on the table somewhere else and we should focus on understanding that.
Nodes spec:
20 servers per cluster
OS: Linux
CPU: Core i7-6700 CPU @ 3.40GHz
RAM: 64Gb (but 6 nodes in both clusters still has 48Gb - they waits upgrade)
SSD: no
HDD: software raid0 2x2Tb
ES Heap size: 31Gb
Index:
Cluster contain 2 daily rotate indices (index1-YYYY.MM.DD and index2-YYYY.MM.DD)
Both indices has almost the same size.
5 shards per index with 1 replica (2 copies of data)
every daily index contain about 160,000,000-220,000,000 docs
Do not use software RAID. Most important, poor disk I/O solutions will thwart powerful CPU like i7-6700. Also, software RAID0, plus disabled transaction durability, is an invitation to data loss. You should use hardware RAID with an optimized file system setting for maximum throughput. Also check if the 2TB drives are built for server performance tasks, or for archival purpose.
You can double indexing speed by the following procedure: 1) create new index with replica level 0 2) bulk index 3) add replica level 1 before enabling search on that index in application.
But I think that alone does still not explain the poor performance.
How is your cluster being setup: Do you have Data nodes only or you also use a Master node? I have seen great performance when I started using a Master node next to the Data nodes.
Also, how much memory do you allocate to the Elasticsearch instances, so what is the value you use for ES_HEAP_SIZE ?
Avoid RAID5/6, prefer RAID 0/1/1+0, enlarge read ahead settings with RAID 1+0, match stripe size of file system creation with controller settings of read ahead blocks, add mount options (for XFS nobarrier,noatime,nodiratime), tune kernel I/O scheduler/elevator for high IOPS (maybe queue or even noop alows more throughput than deadline)
Most important: run your benchmarks to be sure to find optimal settings.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.