ES Cluster Perf is lower than single node ES

Yogesh_BG · August 1, 2020, 12:54pm

Hi

I am using ES veriosn 6.6.0

I use es for heavy load index, almost 20MBPS, ~80K rps
HW deatails, each node is 88CPUc(44 core), 32GB(ES)+32GB(lucen) RAM, 5 SSDs RAID 10.
each index is of size 60GB primary, total 120GB.

There is continuous traffic going on. we keep last 13 indices and delete older index after 12hrs.

when we do this test with one node of ES (use one node data node, 0 replication)out of 3nodes everything goes well. it out performs. But,
when i add two more data nodes and make rep as 1, all goes on toss. not able to get much performance, reduced the replication to 0 in 3node and tried, not helped anything.

here are the index settings:

"logs-2020.08.01.12": {
  "settings": {
    "index": {
  "mapping": {
    "total_fields": {
      "limit": "3000"
    }
  },
  "refresh_interval": "30s",
  "number_of_shards": "1",
  "translog": {
    "flush_threshold_size": "1g",
    "sync_interval": "10s",
    "durability": "async"
  },
  "provided_name": "logs-2020.08.01.12",
  "merge": {
    "scheduler": {
      "max_thread_count": "16"
    }
  },
  "creation_date": "1596285442601",
  "unassigned": {
    "node_left": {
      "delayed_timeout": "10m"
    }
  },
  "number_of_replicas": "0",
  "uuid": "7avWG6PLQHmTapyxl7nJEg",
  "version": {
    "created": "6060099"
  }
}

}
}
}

can any one help me with what could be wrong? in 3node setup its sustaining only 50K rps, after that old gen goes on 100% usage and when i reduce traffic its able to recover, but what could be the reason it works in one node and not in 3 nodes.

Christian_Dahlqvist · August 1, 2020, 1:03pm

Try setting the number of primary shards for the index to 3 and leave the number of replicas at 0. Replicas do the same work as the primary, so adding a replica shard will increase resiliency but likely reduce indexing throughput. One factor that can impact performance as soon as you start clustering is network performance. What kind of networking do you have in place?

Yogesh_BG · August 1, 2020, 2:28pm

Hi

I had kept primary shards by default as 3 and replica as 0. Tried by keeping shards also as 6.

What kind of interface u mean?

We have elasticsearch installed in kubernetes pods. i see huge MB's ~18MB between the pods using iftop -B.

so I tried by having only one ES node, it works has improved good. but when i have 3 containers issues starts.

What kind of network you mean? these are standard dell servers with ~1GB interface.

above example shows 1 shard because i reduced num of nodes to 1, sorry about that

Here are some configurations

indices.fielddata.cache.size: 10%

indices.memory.index_buffer_size: 30%

thread_pool.write.queue_size: 2000

"index.number_of_replicas": 1,
     "index.number_of_shards": 3,
     "index.merge.scheduler.max_thread_count": 16,
     "index.refresh_interval": "30s",
     "index.translog.durability": "async",
     "index.translog.flush_threshold_size": "1g",
     "index.translog.sync_interval": "10s",
     "index.unassigned.node_left.delayed_timeout": "10m",
     "index.mapping.total_fields.limit": 3000

Christian_Dahlqvist · August 1, 2020, 3:25pm

If you only have ~1GB networking that could very well be what is limiting indexing throughput. I have seen nodes being limited by network at even lower throughput levels than you are seing. There will be a lot more data transferred when you have a cluster compared to a standalone node.

How did you arrive at the non-default settings you have here?

Yogesh_BG · August 1, 2020, 4:17pm

I am not sure about network interface bw, but sure i will get details and share here

we are using es since almost 3 years, i have asked many qns in this forum and YOU always helped me with lot of answers thanks for that

These i have tuned these many days, till now we were using HDD and was able to get max 20K, but now we have SSD and trying to get max rps.

Yogesh_BG · August 1, 2020, 4:29pm

Here is the network info

driver: tg3

version: 3.137

firmware-version: FFV21.60.2 bc 5720-v1.39

expansion-rom-version: 

**bus-info: 0000:04:00.0**

supports-statistics: yes

supports-test: yes

supports-eeprom-access: yes

supports-register-dump: yes

supports-priv-flags: no

It's 1GBPS

Christian_Dahlqvist · August 1, 2020, 4:42pm

If the indexing rate goes down with three nodes compared to a single node even if you configure no replicas I usually suspect that network performance is the limiting factor, especially as you have lots of RAM, CPU and fast disks. If that is the case there is not a lot of room for further tuning.

Yogesh_BG · August 1, 2020, 4:52pm

Thanks for your help

Yogesh_BG · August 5, 2020, 1:59pm

Hi @Christian_Dahlqvist

Thanks for your help, yes network was the issue and we replaced it with 10Gig NIC, now we are going up to 150K rps.

system · September 2, 2020, 2:00pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elastic search AWS EC2 cluster indexing performance is decreased compared to single node performance Elasticsearch	5	2828	July 6, 2017
Improve Elastic Cluster Performance/Indexing Rate Elasticsearch	2	445	February 9, 2018
Different number of nodes/replicas/shards doesnt change performance Elasticsearch	10	756	July 5, 2017
How does indexing performance vary over increase in number of nodes? Elasticsearch	10	2040	July 5, 2017
Slow indexing speed in 3 nodes cluster Elasticsearch	12	1201	March 29, 2019

ES Cluster Perf is lower than single node ES

Related topics