Indexing Performance drop after a few hours

Hi,
I'm having an issue with the Elasticsearch after a few hours (2-3 hours). There is a massive performance drop (50%-60%) but no symptoms as to why.

Setup:
Three nodes: 1 master two data nodes
Elasticsearch version: 7.12.0
Configuration: All recommended settings are in place (e.g., ulimits)
Purpose: Heavy Indexing Operation

CPU/Memory/Heap/File Descriptors/Thread count/Hot Threads/merge/delete/refresh are normal (no increase). So my application runs fine if I disable Elasticsearch. I also tried various heap sizes/node memory/refresh intervals, but there is no effect, and nothing in the log files (log level info)

I also looked at the wire statistics using Wireshark. When the performance drops, it seems it takes longer for the response to come back (http.time) but there is no change in RTT between the nodes.

PS: I'm new to Elasticsearch.

Any help is appreciated.
Thanks

Having a single master eligible node is bad as it makes that node a single point of failure. I would recommend making the data nodes master eligible as well for improved resilience.

Some additional information would be useful:

  • How many indices and shards are you actively indexing into?
  • How many concurrent indexing porocesses/threads are you using?
  • What is the bulk size and average size of the indexed documents?
  • Are you assigning document IDs before indexing or allowing Elasticsearch to define the IDs?

Indexing in Elasticsearch can often be very I/O intensive. What type of hardware are you using? What type of storage are you using? Local SSDs?

Having a single master eligible node is bad as it makes that node a single point of failure. I would recommend making the data nodes master eligible as well for improved resilience.

Both data nodes are eligible to be the master node

  • How many indices and shards are you actively indexing into?

Three indices (two heavy indices one light index). Two shards/index (primary and replica)

  • How many concurrent indexing processes/threads are you using?

active threads 1-3/node

  • What is the bulk size and average size of the indexed documents?

about 400KB and 4.3K index requests

  • Are you assigning document IDs before indexing or allowing Elasticsearch to define the IDs?

No, IDs are assigned by Elasticsearch

AWS SSDs. The disk usage shows no anomalous activity

What type of instances are you using? Exactly what type of AWS SSD storage are you using? Are you monitoring iowait on the data nodes?

GP2 storage and nvme drives. There is nothing strange in the iowait for the data node.

Another point is, if fewer index requests are sent then it just takes longer to reach the same performance drop

What are the data nodes using? If gp2, what is the size of the volumes?

GP2 100GB/node

m5 2xlarge

gp2 EBS as far as I recall get IOPS allocated based on size. I think it is 3 IOPS per GB so I believe your disks would only support 300 IOPS, which is not a lot. This could very well be the limiting factor once indices grow and larger, I/O intensive merges take place. You may want rto test upgrading or increasing the size of your storage to see if that makes a difference.

Thank you very much. I will upgrade the size and see if that makes any difference.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.