Slow Indexing speed / Bottleneck

Hi Experts,

I know there are many threads opened for this but I haven't been able to fix my issue with them.

I am new to this.

In our ES cluster, there are 4 data, 1 master and 1 ingest nodes.

The index rate is currently 35k per min. I tried to increase the indexing speed by adding additional data node but that didn't help.

I have configured my OS as per the official documentation and done some tuning for faster indexing (index.refresh_interval = 30s etc). Please could you identify the bottleneck (s), if any?

ES version: 7.8.1

Data is being sent to ES by multiple Filebeat agents (Default Filebeat index template is being used).

I have removed all unnecessary nested fields but there has hardly been any impact.

Total Shards: 10, Replicas: 0

Even though it most likely is not related to your performance problems I would like to point out that having a single master eligible node is bad. You should always look to have at least 3 master eligible nodes in a cluster as that adds resiliency and reduces risk of catastrophic failure.

Some additional information about your cluster is required though. What is the hardware profile if the different nodes? What type of storage are you using? What does CPU and memory usage look like on the nodes? Is there any evidence of slow or long GC in the logs on any of the nodes?

Hardware profile of the nodes.

Data: (7.5GB RAM + 140 GB SSD) * 2 (reduced from 4 earlier) ---- Avg CPU utilisation: 17%
Ingest: 7.5GB RAM + 30 GB HDD ---- Avg CPU utilisation: 5%
Master: 3.5 GB RAM + 30GB HDD ---- Avg CPU utilisation: 17%

Master: 3.5 GB RAM + 30GB HDD ---- Avg CPU utilisation: 1% (Earlier value was incorrect)

I do not know how to interpret the gc.log but pasting a snippet below. It's almost identical across all data node

> [2020-08-18T09:45:04.881+0000][1935][gc,phases ] GC(213) Pre Evacuate Collection Set: 0.7ms [2020-08-18T09:45:04.881+0000][1935][gc,phases ] GC(213) Merge Heap Roots: 0.4ms [2020-08-18T09:45:04.881+0000][1935][gc,phases ] GC(213) Evacuate Collection Set: 44.2ms [2020-08-18T09:45:04.881+0000][1935][gc,phases ] GC(213) Post Evacuate Collection Set: 2.7ms [2020-08-18T09:45:04.881+0000][1935][gc,phases ] GC(213) Other: 0.7ms [2020-08-18T09:45:04.881+0000][1935][gc,heap ] GC(213) Eden regions: 1803->0(1807) [2020-08-18T09:45:04.881+0000][1935][gc,heap ] GC(213) Survivor regions: 40->36(231) [2020-08-18T09:45:04.881+0000][1935][gc,heap ] GC(213) Old regions: 245->246 [2020-08-18T09:45:04.881+0000][1935][gc,heap ] GC(213) Archive regions: 2->2 [2020-08-18T09:45:04.881+0000][1935][gc,heap ] GC(213) Humongous regions: 102->102 [2020-08-18T09:45:04.881+0000][1935][gc,metaspace ] GC(213) Metaspace: 107472K(115604K)->107472K(115604K) NonClass: 94095K(98964K)->94095K(98964K) Class: 13376K(16640K)->13376K(16640K) [2020-08-18T09:45:04.881+0000][1935][gc ] GC(213) Pause Young (Normal) (G1 Evacuation Pause) 2190M->383M(3072M) 48.811ms [2020-08-18T09:45:04.881+0000][1935][gc,cpu ] GC(213) User=0.06s Sys=0.00s Real=0.05s [2020-08-18T09:45:04.881+0000][1935][safepoint ] Safepoint "G1CollectForAllocation", Time since last: 1324632764 ns, Reaching safepoint: 335724 ns, At safepoint: 48939017 ns, Total: 49274741 ns [2020-08-18T09:45:05.882+0000][1935][safepoint ] Safepoint "Cleanup", Time since last: 1000127915 ns, Reaching safepoint: 229413 ns, At safepoint: 5627 ns, Total: 235040 ns [2020-08-18T09:45:06.882+0000][1935][safepoint ] Safepoint "Cleanup", Time since last: 1000163761 ns, Reaching safepoint: 231265 ns, At safepoint: 4832 ns, Total: 236097 ns

@Anirbaan_Chowdhury Maybe it's not an indexing speed issue more an ingestion speed issue. It probably depends on the ingest pipeline complexity, but I know that in my production stack, 1 ingest node is not enough.

@willemdh, you are right. I doubled the capacity of the ingest node and the indexing speed increases considerably.

Anything else you would like to recommend to further improve the indexing speed?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.