Architecting cluster for fast searching

I am trying to architect a cluster optimized for search speed. Right now we have three dedicated data nodes on bare metal, each with 64 cores, 755GB RAM, and 15TB SSD's along with 3 dedicated master nodes on VMs (4 cores, 32GB RAM). We index ~1600GB of data into 1 index, with 6 shards and 1 replica (~130GB per shard). We are willing to compromise indexing speed to maximize search speed.

We are currently seeing response times up to 5000ms at the 99th percentile on our queries. We are wondering if there are any immediate red flags in our architecture. Please let me know if more information is required to give proper advice.

Would it be best to eliminate the dedicated master nodes on VM's and just run the 3 bare metal servers as master/data nodes?

How can we maximize CPU and memory usage on these massive data nodes? Per the elasticsearch documentation, we are only allocating 31GB of heap to elastic. Our telemetry would indicate that the data nodes are not even coming close to being maxed out on cpu, memory, or I/O.

Any and all advice is highly appreciated!


what type of data are you indexing? Logdata or some custom datamodel?

130GB/shard is too much. Ideally we recommend something in between 10-50GB depending on the datamodel.

To increase search-speed you should also consider using more replicas. That will improve search performance if you have lots of parallel requests.

If your indices are not constantly changed (written or updated) it could help to run _forcemerge to reduce the number of segments.

But yeah the #1 todo would be to reduce the shard-size.

Let me know how that goes...

We are indexing a custom data model. We re-indexed our data using 30 shards (~20GB per shard). This gave us a very small performance boost.

So far, our biggest performance boost came from decreasing the heap enough to enable compressed ordinary object pointers (26GB in this case). We have found that decreasing the heap to 16GB seems to yield the best results, but still not where we would like to be.

We are now exploring G1GC rather than CMS. I will report back with our findings, though we are still open to any other pointers that people may have.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.