Only one node in cluster(6 nodes) slow query and high read I/O when querying

Hi All,

I have some problem when querying. My cluster have 6 data nodes.
When I query, only one node has unusual symptoms. my query is very simple.
I'm doing stress tests, only one node shows high Read I/O(3000/s) , Latency
And Unlike other nodes, It shows a low CPU usage.

Issue node Max: 
  - CPU: 12%
  - Read I/O: 3000/s
  - Latency: 500ms
  - Cgroup CPU Performance: 600m ns

Others Max
  - CPU: 33%
  - Read I/O: 10/s
  - Latency: 10ms
  - Cgroup CPU Performance: 2b ns

Why is this happening?

This is my query

{
    "sort": [
        {
            "rank_score": "desc"
        }
    ],
    "from": 0,
    "size": 1005,
    "_source": false,
    "docvalue_fields": [
        "attributes","acronym","channel_id",
        "channel_name","channel_number","lineup_id",
        "original_logo_image_url","source_id","recency_norm",
        "popularity_norm","search_popularity_norm"
    ],
    "query": {
        "constant_score": {
            "filter": {
                "bool": {
                    "must": [
                        {
                            "term": {
                                "lineup_id": {
                                    "value": "42258"
                                }
                            }
                        }
                    ]
                }
            },
            "boost": 1
        }
    }
}

Are you distributing queries evenly across the cluster? Are queried shards and indices evenly distributed across the cluster? Which version of Elasticsearch are you using?

Hello Christian_Dahlqvist

Primary shard: 1
Replica shard: 5

One node have one shard. and my elasticsearch version is 7.6.1

How are you running your test?

If distribution is even it may be worthwhile checking for hardware problems on the node in question.

Query for random numbers.
Also, the instance is using AWS. Even if you delete a specific instance, it shows the same symptoms in other nodes

this is what happens when I have 2 shards.

Issue nodes(2 nodes) Max: 
  - CPU: 15%
  - Read I/O: 3000/s
  - Latency: 500ms

Others Max
  - CPU: 95%
  - Read I/O: 2500/s
  - Latency: 700ms

I think it's related to the ElasticSearch mechanism.
Does ES drive tasks such as Disk I / O to a particular node?

I can not tell based on the information you have provided so far. I think we need more details about how the cluster is set up and how you are running the test.

Cluster

AWS EKS (c5.2xlarge * 6 instance)
master: 3 nodes
data: 6 nodes
indices: 1 Primary, 5 Replicas.
Request per sec: 60

In the query below, only "value" are changed to random.

{
    "sort": [       {            "rank_score": "desc"        }    ],
    "from": 0,
    "size": 1005,
    "_source": false,
    "docvalue_fields": [
        "attributes","acronym","channel_id",
        "channel_name","channel_number","lineup_id",
        "original_logo_image_url","source_id","recency_norm",
        "popularity_norm","search_popularity_norm"
    ],
    "query": {
        "constant_score": {
            "filter": {
                "bool": {
                    "must": [
                        {
                            "term": {
                                "lineup_id": {
                                    "value": "42258" <- changed this line
                                }
                            }
                        }
                    ]
                }
            },
            "boost": 1
        }
    }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.