Hey everyone,
I'm using ES (9.1.7) and running into a performance issue with my Elasticsearch setup and could use some insights.
Here’s the situation: I have a cluster with 4 client nodes, 8 data nodes, and 3 master nodes. Initially, I had a single index in the cluster that contained about 400,189,709 documents. This index had 50 shards, with about 7 shards per node.
I've now split this single index into four separate indexes:
- assorted: 79,700,171 documents, 20 shards (3 shards per node)
- medical: 54,542,053 documents, 12 shards (2 shards per node)
- member analytics: 249,605,320 documents, 8 shards (1 shard per node)
- pharmacy: 16,342,165 documents, 4 shards (1 shard per node)
The total document count remains the same as the original single index.
Here’s the problem: when I run a query pulling 50k documents with just two fields (no aggregations), I see a huge performance difference. Running the query on the original single index, I get results in about 20 seconds. But when I run the same query on just the “member analytics” index (one of the split indexes), it takes around 200 seconds, which is a massive increase.
There is no visible performance fluctuation in the CPU/JVM also. SS below:
Query used:
{"size":50022,"query":{"bool":{"must":[{"bool":{"must":[{"terms":{"type":["MemberSearch"]}},{"nested":{"path":"udf24","query":{"bool":{"must":[{"range":{"udf24.fromDate":{"lte":"2025-08-31"}}},{"range":{"udf24.toDate":{"gte":"2022-08-01"}}},{"terms":{"udf24.id":["Out of State","Metro Plaza - Region 5"]}}]}},"ignore_unmapped":true}},{"bool":{"should":[{"nested":{"path":"eligibleDates","query":{"bool":{"must":[{"range":{"eligibleDates.fromDate":{"lte":"2025-08-31"}}},{"range":{"eligibleDates.toDate":{"gte":"2025-08-01"}}}]}},"ignore_unmapped":true}},{"nested":{"path":"eligibleDatesDental","query":{"bool":{"must":[{"range":{"eligibleDatesDental.fromDate":{"lte":"2025-08-31"}}},{"range":{"eligibleDatesDental.toDate":{"gte":"2025-08-01"}}}]}},"ignore_unmapped":true}},{"nested":{"path":"eligibleDatesVision","query":{"bool":{"must":[{"range":{"eligibleDatesVision.fromDate":{"lte":"2025-08-31"}}},{"range":{"eligibleDatesVision.toDate":{"gte":"2025-08-01"}}}]}},"ignore_unmapped":true}}],"minimum_should_match":"1"}},{"bool":{"must":[{"nested":{"path":"memberStatus","query":{"bool":{"must":[{"terms":{"memberStatus.eligibilityType":["medical"]}}]}},"ignore_unmapped":true}}]}},{"nested":{"path":"group","query":{"bool":{"must":[{"range":{"group.fromDate":{"lte":"2025-08-31"}}},{"range":{"group.toDate":{"gte":"2025-08-31"}}},{"term":{"group.eligibilityType":{"value":"medical"}}}]}},"ignore_unmapped":true}}]}}]}},"from":0,"_source":{"includes":["intMemberId","memberId"]},"sort":[{"totalPaidAmount":{"unmapped_type":"float","order":"desc"}}],"track_total_hits":true}
I’d love to understand what might be causing this performance gap and how I can optimize the queries to get them back to a reasonable response time. If required, I also have profiling data for both the queries.
Thanks in advance for any suggestions!
Summary
This text will be hidden
