Cluster Optiomization

ragian · January 19, 2020, 9:38am

I need to optimize my elasticsearch cluster to have better performance on all kind of queries.

The current configuration of the cluster is:

Elastic 6.4.0

3 nodes (master, data, ingest)

4 core
16 GB ram (heap 8GB)
1 EBS disk of 1TB

~ 200 indexes
~ 4200 shard (12 shard per index, replica 1 (primary + its copy))
~ 1.5 billion of documents
~ 1.5 TB of data stored

more details about the indexes:

1 index of 100GB
80 indexes of 2GB each-one
80 indexes of 7GB each-one

Continues bulk update queries on the index of 100GB (refresh interval 2s).

Insert on the other indexes (about 10 inserts/second with refresh interval 60s)

Search query on all the indexes with a lot of queries using the index_date_* pattern and that has to scan a lot of indexes (80).

The biggest problem that I have with this configuration is an high load of master server when a query that has to scan 80 indexes is launched. The singular query takes about 2 seconds but a lots of this query are launched in parallel and the execution time increase (until 20 seconds and more) with the master server with a very high load.

I was thinking to a new clustes configuration like this:

3 nodes (data, ingest) (AWS EC2 i3.xlarge instances)

4 cores
30 GB ram (heap 24GB)
Local storage NVMe

3 nodes (master) (AWS EC2 i3.large instances)

2 core
15GB ram (heap 9GB)
Local storage NVMe

I know that the only way to validate the configuration is a test the configuration on the field but according to your experiance, it could be a good starting point to manage my data?

Christian_Dahlqvist · January 19, 2020, 9:47am

As far as I can tell you have far too many shards, which often have a negative impact on performance. Please read this blog post for practical guidance. I would also recommend investigating what is limiting performance, e.g. checking disk I/O using iostat and monitoring CPU usage and GC.

I do think switching to i3 instances will help with disk I/O, which is often the bottleneck. As dedicated master nodes should not serve traffic (indexing or querying) they do not need to be large or have good storage. I would recommend using m5.large instances rather than i3.large.

DavidTurner · January 19, 2020, 10:21am

This is far too many shards, which is a common source of performance issues.

These small indices would be better off having just a single primary, and it would be even better to combine them further into larger indices. If they are time-based indices then consider using rollover, maybe via ILM, to roll over less frequently so you can target a larger shard size.

Here is an article that gives more details on shard count:

system · February 16, 2020, 10:21am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.