Determining Performance Bottleneck

What is the best way to determine the performance bottleneck of the Enterprise Search stack?

We have been getting ETIMEDOUT errors when connecting to App Search via Node.js. It happens intermittently. I am able to simulate this and have run the following scenarios:

  1. Start at 5 requests per second for 60 seconds. Grow from 5 requests per second to 50 requests per second for 120 seconds. Send a sustained 50 requests per second for 600 seconds.

  2. Send a sustained 8 requests per second for 300 seconds.

In the first test, we start seeing massive spikes in the median response time and number of ETIMEDOUT errors. I also saw in the Elastic dashboard that one or both of our Enterprise Search nodes were failing.

The second test is closer to what we're doing in our dev environment now. I didn't catch any failures of Enterprise Search nodes, but we did get about 5% of the requests failed with ETIMEDOUT.

When I run the first test directly against Elasticsearch, they complete successfully so that leads me to believe it's something with Enterprise Search. The CPU does spike, but it's not getting over 35% as you can see at Performance - Droplr.

One thing I should note is that our current usage is based on filters as opposed to searching. I'm not sure if more filters makes a difference.

Where should I be looking and what should I be looking to improve?

Also, does that graph in Elastic Cloud show Enterprise Search instances or Elasticsearch instances? It's showing instance 0 and 1, but doesn't specify whether it's Enterprise Search or Elasticsearch.