Scaling and Optimisation advices

Hello,

I would like to get advices about scaling up an Elasticsearch cluster I currently manage. The cluster is used to store logs from many different environments (dev, staging, ...). Each environment gets its own index template, and the indices are created daily (logstash formatting). We currently keep indices for 10 days per environment.

Index size varies between environments.
Our biggest daily index has around 35GB of data, for 140 million documents.
Our smallest daily index has around 100MB of data.

At the moment, in total, the cluster ingests about 250 million documents each day (~3k/s in average).
I would like to configure it to handle 1 billion documents per day (~12k/s).

The data is sent to Elasticsearch using Filebeat and Logstash, from various locations.

Our current setup starts to struggle with the load. We notice different errors, like Logstash losing connection to Elasticsearch, or even Elasticsearch nodes crashing. Search is getting slower and slower, while more and more people are using Kibana to browse logs.

The cluster setup is very basic: 3 nodes (EC2 m4.xlarge), each with 4 vCPUs, 16GB of RAM and 800GB SSD. There's no master node, only data nodes.

In terms of config:

# elasticsearch.yml
thread_pool.search.queue_size: 100000
thread_pool.search.size: 20

# jvm.options
-Xms8g
-Xmx8g
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+AlwaysPreTouch
-server
-Xss1m
-Djava.awt.headless=true
-Dfile.encoding=UTF-8
-Djna.nosys=true
-Djdk.io.permissionsUseCanonicalPath=true
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true
-Dlog4j.skipJansi=true
-XX:+HeapDumpOnOutOfMemoryError

My first question is, do you see any issue with our current setup? Any config change that could already improve our performance?

My second question is, what would you recommend for us to scale that cluster to handle 1 billion daily documents?

Many thanks,
Jeremie

What do you mean?
GET _cat/nodes?v would help here.

What sharding strategy are you using? by this I mean how many primary and replica shards across the index estate?

You desire x4, would you be willing to size out this cluster x4 horizontally?

The errors and losing connection I presume are HTTP 429 from Logstash when Elasticsearch is unable to keep up with the number of (EPS) events being ingested ergo the "push back" which is Elasticsearch telling us it is unable to keep up with this rate.
GET _cat/thread_pool?v would be great to keep an eye on as it shows the rejected events which you would see go up when you get "push back"

I wonder about the total number of shards also?
GET _cluster/health

Hi @JKhondhu, thanks for your answer!

GET _cat/nodes?v

ip          heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
X.X.X.X               52          99  15    2.91    1.88     1.50 mdi       *      es3
X.X.X.Y               26          97  15    2.56    1.56     1.36 mdi       -      es2
X.X.X.Z               27          97  15    1.23    1.24     1.19 mdi       -      es1

For our indices, we have 3 shards per index and 0 replica.

I would be willing to size it out x4 horizontally if the cluster was already optimised 100%, but I don't believe it is.

GET _cat/thread_pool?v

node_name          name                active queue rejected
es3                bulk                     0     0     5019
es3                fetch_shard_started      0     0        0
es3                fetch_shard_store        0     0        0
es3                flush                    0     0        0
es3                force_merge              0     0        0
es3                generic                  0     0        0
es3                get                      0     0        0
es3                index                    0     0        0
es3                listener                 0     0        0
es3                management               1     0        0
es3                refresh                  0     0        0
es3                search                   0     0        0
es3                snapshot                 0     0        0
es3                warmer                   0     0        0
es2                bulk                     0     0    22783
es2                fetch_shard_started      0     0        0
es2                fetch_shard_store        0     0        0
es2                flush                    0     0        0
es2                force_merge              0     0        0
es2                generic                  0     0        0
es2                get                      0     0        0
es2                index                    0     0        0
es2                listener                 0     0        0
es2                management               1     0        0
es2                refresh                  0     0        0
es2                search                   0     0        0
es2                snapshot                 0     0        0
es2                warmer                   0     0        0
es1                bulk                     0     0    43495
es1                fetch_shard_started      0     0        0
es1                fetch_shard_store        0     0        0
es1                flush                    0     0        0
es1                force_merge              0     0        0
es1                generic                  0     0        0
es1                get                      0     0        0
es1                index                    0     0        0
es1                listener                 0     0        0
es1                management               1     0        0
es1                refresh                  0     0        0
es1                search                   0     0        0
es1                snapshot                 0     0        0
es1                warmer                   0     0        0

Finally, the number of active shards is 2006.

GET _cluster/health

{
  "cluster_name": "logs",
  "status": "green",
  "timed_out": false,
  "number_of_nodes": 3,
  "number_of_data_nodes": 3,
  "active_primary_shards": 1360,
  "active_shards": 2006,
  "relocating_shards": 0,
  "initializing_shards": 0,
  "unassigned_shards": 0,
  "delayed_unassigned_shards": 0,
  "number_of_pending_tasks": 0,
  "number_of_in_flight_fetch": 0,
  "task_max_waiting_in_queue_millis": 0,
  "active_shards_percent_as_number": 100.0
}

Many thanks,
Jeremie

All three nodes are mdi master data ingest. :+1:
Note the bulk rejections. :+1:
2206/3=668 shards per node. :+1:
-Xms8g, -Xmx8g `heap :+1:

The bulk rejections are a sign that your cluster is pushing ingestion as much as it can right now. You can start to find a retention period that works for you and close or delete or backup snapshot away indices past that retention period so you get some resource gain back to the cluster.
Next to that would be add more CPU and RAM for trying to push as much as you can out of these three nodes.
I recommend you should upscale to m4.2xlarge instances and see how much performance benefit you can potentially gain.

Then from there on, if you can see that scaling X2 gives you ~6k/s then you know what to do :slight_smile: X4 :+1:

Thanks @JKhondhu! That's very useful.

That is a lot of shards given the size of the cluster and the amount of heap you have available. I would recommend reducing the shard count. Look at this blog post for some general guidelines.

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.