New Document Indexing Performance Troubleshooting

mfalkenstein · June 26, 2023, 12:40am

So I've been trying to troubleshoot an issue with my Elasticsearch currently being used a production system. Our servers are hosted in AWS and the specs of each node are Standard D16s v3 (16 vcpus, 64 GiB memory, 1.5TB of SSD storage), and the cluster has 30 nodes.

We're using Nifi to index new records into ES, and the Nifi PutElasticsearchHTTPRecord processor configuration is:

Until recently, our Elastic system was handling about 8TB of data without much issue. However in the last two months or so we've taken on more data, bringing the total indexed data to about 18TB. Since this happened, the speed at which new records get indexed into ES has dropped from about 4GB/min to about 20MB/min.

I've made a few configuration settings to try to free up CPU/memory for indexing the new records. Here are the new settings:

PUT /<all of my data indices>/_settings
{
  "index.blocks.read_only_allow_delete": null,
  "index.translog.sync_interval": "60s",
  "index.refresh_interval" : "60s"
}

PUT _cluster/settings
{
  "persistent" : {
    "cluster.max_shards_per_node" : 1000,
    "cluster.routing.allocation.total_shards_per_node" : null,
    "cluster.routing.allocation.enable": "all",
    "cluster.routing.rebalance.enable": "none",
    "cluster.routing.allocation.allow_rebalance": "indices_all_active",
    "cluster.routing.allocation.cluster_concurrent_rebalance": 2,
    "cluster.routing.allocation.node_concurrent_recoveries": 2,
    "cluster.routing.allocation.balance.threshold": 1.0,
    "cluster.routing.allocation.disk.watermark.low": "70%",
    "cluster.routing.allocation.disk.watermark.high": "95%",
    "cluster.routing.allocation.disk.watermark.flood_stage": "98%"
  }
}

My overview monitoring looks mostly normal, aside from the indexing speed being incredibly low with occasional spikes up to the old speed:

Finally, it looks like a few nodes in my cluster have high heap usage, but I'm not seeing much in the way of expensive searches or aggregations being run.

I'm honestly at a bit of a loss for what could be causing such degraded performance, and the random spikes of "normal" performance are even more confusing. Any help or ideas would be greatly appreciated.

Christian_Dahlqvist · June 26, 2023, 4:43am

Which version of Elasticsearch are you using?

What type of storage are you using? Is it gp3 EBS?

What bulk size is NiFi configured to use?

How many indices and shards are you actively indexing into?

What is the average size of the shards you are indexing into?

Are you assigning your own document IDs or letting Elasticsearch generate IDs for you?

Can you share the full output of the cluster stats API?

system · July 24, 2023, 4:44am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch 7.17.10 indexing bottleneck on i3.2xlarge and d3.2xlarge nodes in EKS Elasticsearch	53	1401	June 22, 2023
ElasticSearch performance trouble when indexing data Elasticsearch	11	4287	April 28, 2021
Cluster from virtual machines Elasticsearch	5	769	July 5, 2017
Debugging extremely slow indexing Elasticsearch	39	6402	February 16, 2021
Slow Indexing speed / Bottleneck Elasticsearch	6	720	September 16, 2020

New Document Indexing Performance Troubleshooting

Related Topics