Elasticsearch 2.3 poor performance

Bit · August 5, 2016, 3:13pm

I have ES 1.7.5 cluster with 20 servers. Maximum indexing performance of this cluster is about 6000-7000 docs/sec. Average size of document is about 1.5 Kbytes

First (optional question what do you think about this speed?

And for upgrading to ES 2.3 I setup cluster with another 20 servers and run ES 2.3 but maximum performance of this cluster is about 3000-3500 docs/sec. How to improve speed?

Configurations of clusters are almost default except all strings are not_analyzed. And any server may become master.

Any ideas please.

Christian_Dahlqvist · August 5, 2016, 4:02pm

What type of hardware are you using? What is the specification of the nodes? How are you ingesting data? What does load on the servers look like while you are indexing? How many indices/shards are you actively indexing into?

jprante · August 5, 2016, 4:08pm

Can you specifiy this in MB/sec?

jprante · August 5, 2016, 4:18pm

Sorry - I see now 1.5 KB avg size. It means 9-10 MB/sec. This is very bad performance for 20 servers, it would mean one server could take only 500 KB/sec.

tinle · August 5, 2016, 4:20pm

This is probably the cause. ES 2.x default for translog changed from async to sync. If you set the following in your ES 2.3 cluster, do you get better performance?

index.translog.durability: async

jasontedor · August 5, 2016, 4:24pm

Let's take caution here. First, when making any change to the index.translog.durability, it's immensely important to point out that the tradeoff is a loss of safety. Second, the performance reported here is so incredibly low that I'm skeptical that the best solution to getting back some of the performance is by adjusting the translog sync. I suspect that performance is being left on the table somewhere else and we should focus on understanding that.

tinle · August 8, 2016, 4:19pm

It depends on the use case. For our use case, we're fine with taking that risk.

Agreed that the numbers reported is poor.

Bit · August 8, 2016, 4:27pm

Cluster ES 2.3.3 with this option process 12000-18000 docs/sec
Much better then ES 1.7.5 with 6000-7000 docs/sec

I understand risks of use this option.

jasontedor · August 8, 2016, 4:29pm

Yes, which is why it's important when you recommend someone turning off translog durability that you make them aware of the safety tradeoffs.

Bit · August 8, 2016, 4:50pm

Nodes spec:
20 servers per cluster
OS: Linux
CPU: Core i7-6700 CPU @ 3.40GHz
RAM: 64Gb (but 6 nodes in both clusters still has 48Gb - they waits upgrade)
SSD: no
HDD: software raid0 2x2Tb
ES Heap size: 31Gb

Index:
Cluster contain 2 daily rotate indices (index1-YYYY.MM.DD and index2-YYYY.MM.DD)
Both indices has almost the same size.
5 shards per index with 1 replica (2 copies of data)
every daily index contain about 160,000,000-220,000,000 docs

jprante · August 8, 2016, 5:44pm

Do not use software RAID. Most important, poor disk I/O solutions will thwart powerful CPU like i7-6700. Also, software RAID0, plus disabled transaction durability, is an invitation to data loss. You should use hardware RAID with an optimized file system setting for maximum throughput. Also check if the 2TB drives are built for server performance tasks, or for archival purpose.

You can double indexing speed by the following procedure: 1) create new index with replica level 0 2) bulk index 3) add replica level 1 before enabling search on that index in application.

But I think that alone does still not explain the poor performance.

unknownunknown · August 9, 2016, 12:56pm

Could you clarify this? Are you talking about matching stripe sizes, etc?

Borrelworst · August 9, 2016, 2:44pm

How is your cluster being setup: Do you have Data nodes only or you also use a Master node? I have seen great performance when I started using a Master node next to the Data nodes.

Also, how much memory do you allocate to the Elasticsearch instances, so what is the value you use for ES_HEAP_SIZE ?

Bit · August 9, 2016, 2:58pm

I can not use HW raid.

I can use software raid level 0 or level 1 of two disks or use separately mounted two disks per server.

Bit · August 9, 2016, 2:58pm

I use data nodes only. Any node may become master. ES_HEAP_SIZE=31G

Christian_Dahlqvist · August 9, 2016, 3:02pm

Are you using bulk indexing? If so, what bulk size do you use?

Bit · August 9, 2016, 3:07pm

logstash output config:

elasticsearch {
hosts => ["XXX.XXX.XXX.XXX:9200"]
index => "common-%{+YYYY.MM.dd}"
timeout => 40
flush_size => 2000
}

Borrelworst · August 9, 2016, 3:30pm

Can you try to setup it up with one Masternode which contain no data ? It can also improve the performance on your ES 1.7 cluster.

Damian_Pfister · August 11, 2016, 4:05pm

Any suggestion that ES (either version) is hitting any threadpool limits and dropping events (e.g. bulk rejections)?

What about logstash - any suggestion in the logs that it is not able to keep up?

The cat API gives great insight into this.

Do you have any monitoring solution to give metrics on performance (Marvel, ElasticHQ, elasticsearch-head)?

jprante · August 11, 2016, 6:58pm

Avoid RAID5/6, prefer RAID 0/1/1+0, enlarge read ahead settings with RAID 1+0, match stripe size of file system creation with controller settings of read ahead blocks, add mount options (for XFS nobarrier,noatime,nodiratime), tune kernel I/O scheduler/elevator for high IOPS (maybe queue or even noop alows more throughput than deadline)

Most important: run your benchmarks to be sure to find optimal settings.

Topic		Replies	Views
Problems with performance in ElasticSearch 2.3.2 Elasticsearch	5	1478	July 5, 2017
Indexing performance terrible after upgrading from 1.6 to 2.4 Elasticsearch	2	472	July 5, 2017
Elasticsearch poor indexing performance Elasticsearch	6	848	December 1, 2017
Speeding up indexing in ES 2.2.0 Elasticsearch	18	3731	July 5, 2017
Query Performance Elasticsearch	11	1824	July 6, 2017

Elasticsearch 2.3 poor performance

Related topics