Bulk insert is too slow

Hooman_Bahreini · July 15, 2021, 1:48pm

I have a 2 node Elasticsearch (Version 6.8) cluster (they are running on 2 Ubuntu servers with 4GB RAM) and I am running MySQL on the same servers.

The bulk insert is writing 100 documents at a time and each insert takes around 15 mins.

This is my elasticsearch.yml

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: test-es-cluster
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: data-a
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /var/lib/elasticsearch
#
# Path to log files:
#
path.logs: /var/log/elasticsearch
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host: 172.31.5.1
#
# Set a custom port for HTTP:
#
#http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
discovery.zen.ping.unicast.hosts: ["172.31.5.1", "172.31.32.160"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
discovery.zen.minimum_master_nodes: 2
#
# For more information, consult the zen discovery module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
gateway.recover_after_nodes: 2
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
action.destructive_requires_name: true

xpack.security.enabled: true
xpack.security.authc.accept_default_password: false

# it is mandatory to enable SSL when security is enabled
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: /etc/elasticsearch/cert/elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: /etc/elasticsearch/cert/elastic-certificates.p12

And here is: /etc/default/elasticsearch

################################
# Elasticsearch
################################

# Elasticsearch home directory
#iES_HOME=/usr/share/elasticsearch

# Elasticsearch Java path
JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64

# Elasticsearch configuration directory
ES_PATH_CONF=/etc/elasticsearch

# Elasticsearch PID directory
#PID_DIR=/var/run/elasticsearch

# Additional Java OPTS
#ES_JAVA_OPTS=

# Configure restart on package upgrade (true, every other setting will lead to not restarting)
RESTART_ON_UPGRADE=true

################################
# Elasticsearch service
################################

# SysV init.d
#
# The number of seconds to wait before checking if Elasticsearch started successfully as a daemon process
ES_STARTUP_SLEEP_TIME=5

################################
# System properties
################################

# Specifies the maximum file descriptor number that can be opened by this process
# When using Systemd, this setting is ignored and the LimitNOFILE defined in
# /usr/lib/systemd/system/elasticsearch.service takes precedence
MAX_OPEN_FILES=65535

# The maximum number of bytes of memory that may be locked into RAM
# Set to "unlimited" if you use the 'bootstrap.memory_lock: true' option
# in elasticsearch.yml.
# When using systemd, LimitMEMLOCK must be set in a unit file such as
# /etc/systemd/system/elasticsearch.service.d/override.conf.
#MAX_LOCKED_MEMORY=unlimited

# Maximum number of VMA (Virtual Memory Areas) a process can own
# When using Systemd, this setting is ignored and the 'vm.max_map_count'
# property is set at boot time in /usr/lib/sysctl.d/elasticsearch.conf
#MAX_MAP_COUNT=262144
START_DAEMON=true
ES_HEAP_SIZE=2g

Any suggestion to improve the performance? Search queries seem to be running fine...

Christian_Dahlqvist · July 15, 2021, 2:04pm

The recommendation is to assign 50% of RAM to Elasticsearch heap but that assumes that Elasticsearch has the host to itself which is not the case for you. What type of storage are you using? What does iowait look like when you are indexing? Do you have swap enabled?

Hooman_Bahreini · July 15, 2021, 10:38pm

Thanks Christian for your reply.

The Ubuntu is hosted on AWS, the storage type is SSD (General Purpose gp2). This is a test server, so using cheaper resources.

I don't know how to check io wait?

I don't have any swap enable.

I reduced the heap size to 1GB, but it is still very slow.

I have set the following limits in /etc/security/limits.conf:

# add the following at the end
*       soft    nofile      64000
*       hard    nofile      64000
root    soft    nofile      64000
root    hard    nofile      64000

I have set the heapsize to 1GB in: /etc/elasticsearch/jvm.options

-Xms1g
-Xmx1g

and in /etc/default/elasticsearch I have:

JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
ES_PATH_CONF=/etc/elasticsearch
RESTART_ON_UPGRADE=true
ES_STARTUP_SLEEP_TIME=5
MAX_OPEN_FILES=65535

# DO I NEED THE FOLLOWING TWO VALUES?????????????????
START_DAEMON=true
ES_HEAP_SIZE=1g

Could it be slow because I am using 2 nodes in the cluster? instead of 3 nodes?

Christian_Dahlqvist · July 15, 2021, 10:40pm

What size is your gp2 EBS volume?

Hooman_Bahreini · July 15, 2021, 10:41pm

Its 50 GB

Christian_Dahlqvist · July 15, 2021, 10:45pm

gp2 EBS volumes get IOPS provisioned proportional to size (think it is 3 IOPS per GB) so small volumes can be very slow. Given the size of your volume and the fact that you are running MySQL on it as well makes me think it most likely is poor performing storage that is causing the poor performance. Look at iowait on the nodes, e.g. using iostat, and I suspect you will find this being quite high.

Hooman_Bahreini · July 15, 2021, 10:52pm

This is what I get running iostas (not quite sure what it means?):

Linux 5.3.0-1035-aws (ip-my-ip-address)         07/15/21        _x86_64_        (2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.58    0.08    1.96    0.40    1.93   90.05

Device             tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
loop0             0.11         2.08         0.00       1070          0
loop1             0.09         0.67         0.00        346          0
loop2             0.11         0.70         0.00        359          0
loop3             0.08         0.67         0.00        347          0
loop4            23.00        23.59         0.00      12129          0
loop5            22.57        24.55         0.00      12625          0
loop6             0.01         0.01         0.00          4          0
xvda             41.57      1083.25       578.30     557071     297396

Christian_Dahlqvist · July 16, 2021, 6:06am

Was that iostat taken while the system was ingesting and slow?

Hooman_Bahreini · July 17, 2021, 9:35am

@Christian_Dahlqvist : yes it was...

I rebuilt the server and used io1 instead of gp2... increased the IOPS... also reduced the Heap size to 1.5GB (the sever has 4 GB RAM, and runs both MySQL and Elasticsearch)

It's running fine now... I think it must have been the IO as you pointed out... thanks for your help.

system · August 14, 2021, 9:36am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Slowness in ES bulk inserts Elasticsearch	11	502	December 28, 2020
Bulk inserting is slow Elasticsearch	14	16091	July 6, 2017
Elasticsearch 6.0 bulk write is slower in a cluster but fast in single node setup Elasticsearch	5	1660	December 27, 2017
Bulk is too slow Elasticsearch	34	16278	December 14, 2017
Slow Bulk Insert Elasticsearch	11	2277	July 6, 2017

Bulk insert is too slow

Related topics