We are running behind the schedule to achieve something and we need help. Please take a look and suggest something.
Issue: Slowness on bulk Indexing.
For 10000 records its taking 27560 ms
For 500 records its taking about 4448 ms
-
We want to index about 100,000 documents and its taking 372023 ms which is too much.
-
We tried different bulk size from 500, 20000,100000 but not able to achieve desired result.
-
We tried with JestClient, RestHighLevelClient, BulkProcessor but nothing is helping
System Configurations:
- OS: Linux (debian 9.12)
- Standard DS2 v2 (2 vcpus, 7 GiB memory)
- 3 nodes configured
jvm.options:
-Xms5g
-Xmx5g
-XX:+UseG1GC
-XX:+UseStringDeduplication
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
elasticsearch.yml
// # ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
# Before you set out to tweak and tune the configuration, make sure you
# understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
cluster.name: hydroperformance
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: hydoperfes0
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
path.data: /opt/bitnami/elasticsearch/data
#
# Path to log files:
#
#path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host: 0.0.0.0
#
# Set a custom port for HTTP:
#
#http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
discovery.zen.ping.unicast.hosts: ["hydoperfes0","hydoperfes1","hydoperfes2"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
discovery.zen.minimum_master_nodes: 2
#
# For more information, consult the zen discovery module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true
transport.tcp.port: 9300
network.publish_host: 10.0.0.6
discovery.initial_state_timeout: 5m
gateway.expected_nodes: 3
indices.memory.index_buffer_size: 30%
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Thanks In advance