Poor ingest node and indexing performance

jsyu32 · March 9, 2017, 10:59am

We are using ElasticSearch 5.1.2:
Bootstrap is passed except system_call_filter, because we are running on CentOS 6

Here is the setting:
The document to be indexed have 1200 fields, each document is near 4K. But all the fields are consisted of three types: date, keyword, numberic. 24 millions records in total.
We want to use ElasticSearch to do near real-time aggregations.
We have three common nodes(master:true; data:true; ingest:default true) & one standalone ingest node(master:false; data:false; ingest:true). All these nodes jvm config are same as the following.Each node have 64GB memory, SSD, 24cores.

jvm.options:
-Xms24g
-Xmx24g
-XX:NewSize=8g
-XX:MaxNewSize=8g
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly

We load data into ES by using elasticsearch-spark, DataFrame.saveToEs() method, which is bulk insert. We use 10 executers which means 10 clients to parallel insert to the ingest node.
Here is the performance we got.
Ingest Node's

detail

One of data node's

With 4 nodes, we only got index rate near 1000 document/s。 near 4 ms for indexing one document.
All the nodes' cpu usage is under 10%. it won't be bottleneck.Don't have throttle, bulk queue, bulk rejections, index queue, index rejections or other issues.

Increase client, the performance will deteriorate.

Settings of Index:
"settings": {
"index": {
"number_of_shards": 3,
"number_of_replicas": 0,
"mapping.total_fields.limit": 1500,
"refresh_interval": "60s"
}

Settings of Cluster
Cluster:
{
"persistent": {
"indices": {
"store": {
"throttle": {
"max_bytes_per_sec": "100mb"
}
}
}
},
"transient": {}
}

Other settings are default.

My question is why indexing performance is so poor, only 1000 document/s and why ingest node is so heavy, happened so much young gc, even full gc.
From document, ingest does:
• intercepts bulk and index requests
• applies the transformations
• passes the documents back to the index or bulk APIs
We didn't use pipeline.

system · April 6, 2017, 10:59am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Spark Bulk Import Performance Benchmarks Elasticsearch es-hadoop	9	3374	April 28, 2017
Elasticsearch ingest performance Elasticsearch	9	5523	July 6, 2017
Slow ingestion problem (v 6.2.3) Elasticsearch	14	3649	July 22, 2018
Elasticsearch Query performance while continuous ingestion Elasticsearch	5	577	August 15, 2019
Performance Problems Elasticsearch user-experience	28	1514	February 26, 2024

Poor ingest node and indexing performance

Related topics