I have weird issue after migrating from 2.4 to 6.3.
My cluster consists of 10 nodes:
3 master nodes, (m3.medium, 1CPU/4GB RAM)
3 client nodes running ES, Logstash and Kibana. (c5.xlarge 4xCPU/8GB RAM)
4 data nodes (r4.2xlarge, 61 GB RAM/8 CPU)
Data inflow looks like this:
Load balancer sends data to logstash TCP listener
Kibana and Logstash sends data and queries to localhost client node, which then supposed to load balance them to data nodes.
The problem here is that I see queries take much longer than they used to in 2.4 and Datadog reports that one of the nodes gets more queries than others
I don't understand why...Can anybody help here?
I'm continue to check for any differences in configuration
I've checked, shards are distributed evenly across all data nodes,
I recently switched from 2 shards per index to 4, about a week ago. but timeouts are happening even on shorter periods of time
I'm using routing awareness, based on availability zone, and I just realized that node-0 and node-3 are in the same AZ, while others are in their own... I will re-deploy new instance in a 4th zone and see if that helps
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.