Re-evaluating shard setup

claohr · September 24, 2020, 10:44am

I'm running 3 node (26 GB EBS volume, 4GB Memory, c5.large) ES cluster v7.7 (AWS) for 1 index with the following setting

PUT /my-index
{
  "settings": {
     "number_of_shards": 1,
     "number_of_replicas": 1
    }
}

After initial research, I understood that i will not get the shard allocation "right", unless at least i know ahead of time the size of index etc. Therefore i believe now it's the time to re evaluate the setup.

This index currently holds searchable ~37M documents (deleted documents currently sit at ~4M) and its occupying 5GB. It is not expected to grow rapidly. By end of the year might increase to 40M docs.

What id like to understand is what I got wrong as the cluster is experiencing the following:

One node not receiving search requests (screenshot attached)

At random points in time, the cluster is having spikes in Search Latency over 6secs, http 400 response codes are thrown, without any cpu spike indication, and the way its "resolved", its by not sending any more requests to the ES cluster.

Wolfram_Haussig · September 24, 2020, 10:52am

Hi Chris,

The first point is to be expected I think: You have 3 nodes and configured only 1 primary and 1 replica shard so one server has the primary and another the secondary shard. The third server does not have anything. You might want to increase the number of replicas to 2 so the queries can use all available servers.

Unfortunately, I do not have an idea why you have such spikes in latency.

Best regards
Wolfram

Christian_Dahlqvist · September 24, 2020, 10:57am

What type of instances are you using? What type of storage?

claohr · September 24, 2020, 11:00am

Thank you for your response Wolfram

claohr · September 24, 2020, 11:02am

Hi Christian, i've updated my post. Each node is set for 26GB EBS storage, 8GB Memory

Christian_Dahlqvist · September 24, 2020, 11:03am

Which instance type are you using? t2?

claohr · September 24, 2020, 11:10am

It is the compute optimised c5.large

Christian_Dahlqvist · September 24, 2020, 11:14am

There are a couple of things I can think of that could cause latency spikes. The first is GC. Could you check in the logs if there is any long GC reported around the time of the spike? It is also worth noting that EBS IOPS are proportional to the volume size unless you have PIOPS. Are you monitoring disk I/O so you can see if there is any correlation?

claohr · September 24, 2020, 11:30am

I'm using EBS IOPS, thank you for mentioning that, ill look further into that.
Sorry I don't have visibility on disk I/O at the time of the spike.

The only screenshot i can provide is this one which shows the 2 spikes in the threadpool and the GC metrics

Hans_Kruse · September 26, 2020, 6:58am

I am having a similar issue. For me a few things helped. I still have questions myself.

Context: I have 3 8GB AWS nodes with a 100GB SSD attached to each in AWS

Limited shards to 15GB with ILM. Why not 8 or 3.5GB? 20GB is slower, I tried.
I use 3 primaries and 1 replica. Wrong?
Turned of atime on the SSD. It no longer writes at a read.
Ensured memory lock and max open files were set in systemd config
Limited the max JVM memory to 3.5 Gb. This allows to use 32 bit compact pointers and gives about half the memory to file cache. 300Mb is used by the OS.
Ensured my JSON is sorted.
No swap file or partition.
Strict schema, almost no text fields that are indexed.

My data is time based but my queries are not always time limited. No data is thrown away..
I have +400 fields per document. Most keywords, only a few text fields.

Limiting the shard size from 40Gb to 15Gb seemed to do the most for me. Performance went from dramatic slow tot wow fast.

Having guidance from Elastic for several very different use cases would be nice. It is a bit of whack a mole now.

What size should my shards be?
What optimal AWS vm config?
Elastic config?
Schema tuning?
Use nested, flattened?
Turn off source?
Compress?
Having most to default works! But limiting the max shard size helped a lot.

system · October 24, 2020, 6:58am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Trying to optimize Elasticsearch cluster Elasticsearch	3	966	February 20, 2017
Assistance regarding optimizing an Elasticsearch cluster for analytics Elasticsearch	9	1537	July 5, 2017
Few queries on setting up a high performing and scalable ES setup Elasticsearch	3	327	July 6, 2017
Advice on cluster configuration Elasticsearch	10	553	January 8, 2019
Elasticsearch cluster design thoughts Elasticsearch	9	776	December 17, 2019

Re-evaluating shard setup

Related topics