Slowness in ES bulk inserts

Chirag_Patwa · November 27, 2020, 2:00pm

We are running behind the schedule to achieve something and we need help. Please take a look and suggest something.

Issue: Slowness on bulk Indexing.
For 10000 records its taking 27560 ms
For 500 records its taking about 4448 ms

We want to index about 100,000 documents and its taking 372023 ms which is too much.
We tried different bulk size from 500, 20000,100000 but not able to achieve desired result.
We tried with JestClient, RestHighLevelClient, BulkProcessor but nothing is helping

System Configurations:

OS: Linux (debian 9.12)
Standard DS2 v2 (2 vcpus, 7 GiB memory)
3 nodes configured

jvm.options:
-Xms5g
-Xmx5g
-XX:+UseG1GC
-XX:+UseStringDeduplication
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly

elasticsearch.yml

 // # ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: hydroperformance
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: hydoperfes0
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /opt/bitnami/elasticsearch/data
#
# Path to log files:
#
#path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host: 0.0.0.0
#
# Set a custom port for HTTP:
#
#http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
discovery.zen.ping.unicast.hosts: ["hydoperfes0","hydoperfes1","hydoperfes2"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
discovery.zen.minimum_master_nodes: 2
#
# For more information, consult the zen discovery module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true

transport.tcp.port: 9300
network.publish_host: 10.0.0.6
discovery.initial_state_timeout: 5m
gateway.expected_nodes: 3
indices.memory.index_buffer_size: 30%

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Thanks In advance

Christian_Dahlqvist · November 27, 2020, 2:46pm

You should not set this higher than 50% of available RAM, which in this case is 3.5GB.

I strongly recommend using the default settings and not override these expert level settings.

Have you identified what is limiting throughput? Assuming you are not seeing long or frequent GC it is often either CPU usage or disk I/O, so it is important to use fast storage. See this section in the docs for more information and tips. Also make sure you are indexing into enough shards to get all the nodes involved. If you index into a single index with the default 1 primary shard only one or two nodes will be doing work.

egalpin · November 27, 2020, 9:17pm

In the docs linked by @Christian_Dahlqvist it briefly mentions "unsetting" refreshes, but doesn't expand. One can disable refresh entirely by setting refresh_interval=-1 on an index's settings if you need to perform a 1-time large-scale ingestion event. Docs will be largely unsearchable during until refresh is re-enabled, but it can speed things up for initial data loads.

leandrojmp · November 28, 2020, 1:18am

Are you on Azure, correct? Is your storage using SSD? Have you tested to see if how many IOPS it is getting?

I had a problem a couple of years ago on Azure that even using the Premium SSD based storage the IOPS were pretty low, only solved after opening a ticket to support to check the hardware behind it, which was failing.

Chirag_Patwa · November 28, 2020, 3:46am

I tried disabling it from settings API on index, that didn't help in our case

Chirag_Patwa · November 28, 2020, 3:48am

Yes. its on Azure portal. I will check on SSD and IOPS. I will post my findings

Chirag_Patwa · November 28, 2020, 3:49am

I will try setting 50% RAM and reverting other configurations to see how it behaves.

Chirag_Patwa · November 30, 2020, 9:53am

@Christian_Dahlqvist, We tried following as you suggested

Did this change didn't get any improvement for indexing about 500 document performance was the same its was taking average 3seconds.

Reverted this still no gain.

We have by 5 shards configured. Also, followed the link and did following:

Changed setting for a index , set refresh_interval to 90s and did indexing but no performance gain.
Used multiple worker thread to index, but each individual thread was taking average 3 seconds for 500 documents.
We just have 1 replicas.

Just for your information we have unmanaged standard HDD, do you think that might be slowing us down?

Chirag_Patwa · November 30, 2020, 9:55am

We are using unmanaged Standard HDD. Do you think that will not give us milliseconds response time for indexing 500 documents?

Christian_Dahlqvist · November 30, 2020, 10:04am

Storage performance is often the limiting factor. I would recommend you try with premium storage and see what impact that has.

leandrojmp · November 30, 2020, 2:32pm

Probably not, those unmanaged Standard HDD are pretty slow for elasticsearch.

You should try the Premium tier, even the unmanaged premium tier is better than the HDD, they offer better speed, but they are also more expensive.

system · December 28, 2020, 2:32pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch bulk slows down after a certain amount of documents Elasticsearch	4	1288	April 24, 2020
Slow bulk indexing Elasticsearch	4	2080	July 5, 2017
Bulkload performance issue Elasticsearch	2	377	September 14, 2019
Bulk Insert Throughput Issues Elasticsearch	2	312	July 6, 2017
Bulk inserting is slow Elasticsearch	14	16091	July 6, 2017

Slowness in ES bulk inserts

Related topics