IOPS for a Data node is extremely high while other not

Version: Elastic Search 7.13
I am running 3 master nodes, 2 client nodes and 10 data nodes (199-399 Gb each node)
And one day I have seen the IOPS for a Data node is extremely high while other not

Could you please help?

Welcome to our community! :smiley:

Is it causing issues?
What do the logs on the node show?

Dear Warkolm, we got some issues but not sure if it comes from the problem.

I can see a log like this:

[parent] Data too large, data for [<http_request>] would be [4085636510/3.8gb], which is larger than the limit of [4080218931/3.7gb], real usage: [4085363984/3.8gb], new bytes reserved: [272526/266.1kb], usages [request=220205/215kb, fielddata=0/0b, in_flight_requests=573853808/547.2mb, model_inference=0/0b, accounting=0/0b]\n    at onBody 

That is not related. To help figure it out, what is the output from the _cluster/stats?pretty&human API?

It looks like a lot of reading with very little writing so that likely rules out merging. Is your data evenly distributed across the cluster? Does this node hold any indices that are frequently queried? Do you have any other processes installed on this node that could result in data on disk being read/scanned, e.g. anti-virus or security software?

Here is the output

{
  "_nodes": {
    "total": 10,
    "successful": 10,
    "failed": 0
  },
  "cluster_name": "prod-es-v7",
  "cluster_uuid": "rXIghHxHRTqlaQLoz6EccA",
  "timestamp": 1630438438358,
  "status": "green",
  "indices": {
    "count": 33,
    "shards": {
      "total": 190,
      "primaries": 95,
      "replication": 1,
      "index": {
        "shards": {
          "min": 2,
          "max": 24,
          "avg": 5.757575757575758
        },
        "primaries": {
          "min": 1,
          "max": 12,
          "avg": 2.878787878787879
        },
        "replication": {
          "min": 1,
          "max": 1,
          "avg": 1
        }
      }
    },
    "docs": {
      "count": 207831318,
      "deleted": 64153677
    },
    "store": {
      "size": "147.5gb",
      "size_in_bytes": 158450575176,
      "total_data_set_size": "147.5gb",
      "total_data_set_size_in_bytes": 158450575176,
      "reserved": "0b",
      "reserved_in_bytes": 0
    },
    "fielddata": {
      "memory_size": "59.1mb",
      "memory_size_in_bytes": 62016424,
      "evictions": 0
    },
    "query_cache": {
      "memory_size": "766.6mb",
      "memory_size_in_bytes": 803845834,
      "total_count": 54719348566,
      "hit_count": 11363845700,
      "miss_count": 43355502866,
      "cache_size": 179848,
      "cache_count": 367497739,
      "evictions": 367317891
    },
    "completion": {
      "size": "0b",
      "size_in_bytes": 0
    },
    "segments": {
      "count": 1445,
      "memory": "27mb",
      "memory_in_bytes": 28350294,
      "terms_memory": "13.7mb",
      "terms_memory_in_bytes": 14369176,
      "stored_fields_memory": "751.6kb",
      "stored_fields_memory_in_bytes": 769656,
      "term_vectors_memory": "0b",
      "term_vectors_memory_in_bytes": 0,
      "norms_memory": "1.8mb",
      "norms_memory_in_bytes": 1977792,
      "points_memory": "0b",
      "points_memory_in_bytes": 0,
      "doc_values_memory": "10.7mb",
      "doc_values_memory_in_bytes": 11233670,
      "index_writer_memory": "194.1mb",
      "index_writer_memory_in_bytes": 203574398,
      "version_map_memory": "3.5mb",
      "version_map_memory_in_bytes": 3695864,
      "fixed_bit_set": "189.5mb",
      "fixed_bit_set_memory_in_bytes": 198716080,
      "max_unsafe_auto_id_timestamp": 1630368006181,
      "file_sizes": {}
    },
    "mappings": {
      "field_types": [
        {
          "name": "boolean",
          "count": 68,
          "index_count": 26,
          "script_count": 0
        },
        {
          "name": "byte",
          "count": 4,
          "index_count": 2,
          "script_count": 0
        },
        {
          "name": "date",
          "count": 92,
          "index_count": 24,
          "script_count": 0
        },
        {
          "name": "float",
          "count": 74,
          "index_count": 9,
          "script_count": 0
        },
        {
          "name": "half_float",
          "count": 56,
          "index_count": 14,
          "script_count": 0
        },
        {
          "name": "integer",
          "count": 170,
          "index_count": 9,
          "script_count": 0
        },
        {
          "name": "keyword",
          "count": 743,
          "index_count": 28,
          "script_count": 0
        },
        {
          "name": "long",
          "count": 1286,
          "index_count": 26,
          "script_count": 0
        },
        {
          "name": "nested",
          "count": 41,
          "index_count": 11,
          "script_count": 0
        },
        {
          "name": "object",
          "count": 880,
          "index_count": 24,
          "script_count": 0
        },
        {
          "name": "scaled_float",
          "count": 378,
          "index_count": 2,
          "script_count": 0
        },
        {
          "name": "short",
          "count": 2,
          "index_count": 2,
          "script_count": 0
        },
        {
          "name": "text",
          "count": 451,
          "index_count": 19,
          "script_count": 0
        }
      ],
      "runtime_field_types": []
    },
    "analysis": {
      "char_filter_types": [
        {
          "name": "mapping",
          "count": 6,
          "index_count": 6
        }
      ],
      "tokenizer_types": [
        {
          "name": "edge_ngram",
          "count": 8,
          "index_count": 6
        },
        {
          "name": "ngram",
          "count": 4,
          "index_count": 2
        }
      ],
      "filter_types": [
        {
          "name": "edge_ngram",
          "count": 4,
          "index_count": 4
        },
        {
          "name": "shingle",
          "count": 4,
          "index_count": 4
        }
      ],
      "analyzer_types": [
        {
          "name": "custom",
          "count": 58,
          "index_count": 6
        }
      ],
      "built_in_char_filters": [],
      "built_in_tokenizers": [
        {
          "name": "icu_tokenizer",
          "count": 34,
          "index_count": 6
        },
        {
          "name": "keyword",
          "count": 2,
          "index_count": 2
        },
        {
          "name": "standard",
          "count": 4,
          "index_count": 4
        },
        {
          "name": "whitespace",
          "count": 6,
          "index_count": 6
        }
      ],
      "built_in_filters": [
        {
          "name": "asciifolding",
          "count": 2,
          "index_count": 2
        },
        {
          "name": "lowercase",
          "count": 56,
          "index_count": 6
        },
        {
          "name": "reverse",
          "count": 12,
          "index_count": 6
        },
        {
          "name": "stop",
          "count": 12,
          "index_count": 6
        }
      ],
      "built_in_analyzers": []
    },
    "versions": [
      {
        "version": "7.13.0",
        "index_count": 33,
        "primary_shard_count": 95,
        "total_primary_size": "74.3gb",
        "total_primary_bytes": 79793397268
      }
    ]
  },
  "nodes": {
    "count": {
      "total": 10,
      "coordinating_only": 0,
      "data": 5,
      "data_cold": 5,
      "data_content": 5,
      "data_frozen": 5,
      "data_hot": 5,
      "data_warm": 5,
      "ingest": 0,
      "master": 3,
      "ml": 10,
      "remote_cluster_client": 10,
      "transform": 5,
      "voting_only": 0
    },
    "versions": [
      "7.13.0"
    ],
    "os": {
      "available_processors": 50,
      "allocated_processors": 50,
      "names": [
        {
          "name": "Linux",
          "count": 10
        }
      ],
      "pretty_names": [
        {
          "pretty_name": "CentOS Linux 8",
          "count": 10
        }
      ],
      "architectures": [
        {
          "arch": "amd64",
          "count": 10
        }
      ],
      "mem": {
        "total": "188.9gb",
        "total_in_bytes": 202899456000,
        "free": "18.9gb",
        "free_in_bytes": 20378656768,
        "used": "169.9gb",
        "used_in_bytes": 182520799232,
        "free_percent": 10,
        "used_percent": 90
      }
    },
    "process": {
      "cpu": {
        "percent": 112
      },
      "open_file_descriptors": {
        "min": 507,
        "max": 1477,
        "avg": 849
      }
    },
    "jvm": {
      "max_uptime": "53.7d",
      "max_uptime_in_millis": 4646687312,
      "versions": [
        {
          "version": "16",
          "vm_name": "OpenJDK 64-Bit Server VM",
          "vm_version": "16+36",
          "vm_vendor": "AdoptOpenJDK",
          "bundled_jdk": true,
          "using_bundled_jdk": true,
          "count": 10
        }
      ],
      "mem": {
        "heap_used": "46.1gb",
        "heap_used_in_bytes": 49536569584,
        "heap_max": "100gb",
        "heap_max_in_bytes": 107374182400
      },
      "threads": 799
    },
    "fs": {
      "total": "1.6tb",
      "total_in_bytes": 1790086451200,
      "free": "1.4tb",
      "free_in_bytes": 1624459235328,
      "available": "1.3tb",
      "available_in_bytes": 1533291859968
    },
    "plugins": [
      {
        "name": "analysis-phonetic",
        "version": "7.13.0",
        "elasticsearch_version": "7.13.0",
        "java_version": "1.8",
        "description": "The Phonetic Analysis plugin integrates phonetic token filter analysis with elasticsearch.",
        "classname": "org.elasticsearch.plugin.analysis.AnalysisPhoneticPlugin",
        "extended_plugins": [],
        "has_native_controller": false,
        "licensed": false,
        "type": "isolated"
      },
      {
        "name": "analysis-kuromoji",
        "version": "7.13.0",
        "elasticsearch_version": "7.13.0",
        "java_version": "1.8",
        "description": "The Japanese (kuromoji) Analysis plugin integrates Lucene kuromoji analysis module into elasticsearch.",
        "classname": "org.elasticsearch.plugin.analysis.kuromoji.AnalysisKuromojiPlugin",
        "extended_plugins": [],
        "has_native_controller": false,
        "licensed": false,
        "type": "isolated"
      },
      {
        "name": "analysis-icu",
        "version": "7.13.0",
        "elasticsearch_version": "7.13.0",
        "java_version": "1.8",
        "description": "The ICU Analysis plugin integrates the Lucene ICU module into Elasticsearch, adding ICU-related analysis components.",
        "classname": "org.elasticsearch.plugin.analysis.icu.AnalysisICUPlugin",
        "extended_plugins": [],
        "has_native_controller": false,
        "licensed": false,
        "type": "isolated"
      },
      {
        "name": "analysis-ukrainian",
        "version": "7.13.0",
        "elasticsearch_version": "7.13.0",
        "java_version": "1.8",
        "description": "The Ukrainian Analysis plugin integrates the Lucene UkrainianMorfologikAnalyzer into elasticsearch.",
        "classname": "org.elasticsearch.plugin.analysis.ukrainian.AnalysisUkrainianPlugin",
        "extended_plugins": [],
        "has_native_controller": false,
        "licensed": false,
        "type": "isolated"
      },
      {
        "name": "discovery-ec2",
        "version": "7.13.0",
        "elasticsearch_version": "7.13.0",
        "java_version": "1.8",
        "description": "The EC2 discovery plugin allows to use AWS API for the unicast discovery mechanism.",
        "classname": "org.elasticsearch.discovery.ec2.Ec2DiscoveryPlugin",
        "extended_plugins": [],
        "has_native_controller": false,
        "licensed": false,
        "type": "isolated"
      },
      {
        "name": "analysis-stempel",
        "version": "7.13.0",
        "elasticsearch_version": "7.13.0",
        "java_version": "1.8",
        "description": "The Stempel (Polish) Analysis plugin integrates Lucene stempel (polish) analysis module into elasticsearch.",
        "classname": "org.elasticsearch.plugin.analysis.stempel.AnalysisStempelPlugin",
        "extended_plugins": [],
        "has_native_controller": false,
        "licensed": false,
        "type": "isolated"
      }
    ],
    "network_types": {
      "transport_types": {
        "netty4": 10
      },
      "http_types": {
        "netty4": 10
      }
    },
    "discovery_types": {
      "zen": 10
    },
    "packaging_types": [
      {
        "flavor": "default",
        "type": "docker",
        "count": 10
      }
    ],
    "ingest": {
      "number_of_pipelines": 0,
      "processor_stats": {}
    }
  }
}

Here is info for my cluster

iip           heap.percent ram.percent cpu load_1m load_5m load_15m node.role master
x.x.x.x            7          62   4    0.06    0.03     0.00 lmr       *     
x.x.x.x           59          62   0    0.00    0.01     0.00 lmr       -     
x.x.x.x           50          97  18    2.08    2.27     2.36 cdfhlrstw -     
x.x.x.x           37          95  20    2.57    2.48     2.39 cdfhlrstw -     
x.x.x.x           37          64  13    0.18    0.27     0.27 lr        -     
x.x.x.x           49          95  32    1.65    1.91     2.09 cdfhlrstw -     
x.x.x.x           19          98  42    2.62    2.55     2.59 cdfhlrstw -     
x.x.x.x           58          64  13    0.38    0.30     0.24 lr        -     
x.x.x.x           23          98  20    2.77    2.95     3.04 cdfhlrstw -     
x.x.x.x           34          61   0    0.00    0.00     0.00 lmr       -          -     

Cluster heath:

epoch      timestamp cluster               status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1630439287 19:48:07  mycluster             green          10         5    190  95    0    0        0             0                  -                100.0%

Is your data evenly distributed across the cluster?
What do you mean? I have run a cluster with 3 master nodes, 2 client nodes and around 10 data nodes and I can confirm that all the data nodes have data but I do not know why some data nodes have bigger size (Elasticsearch data) than others

Does this node hold any indices that are frequently queried?
How can I check that?

Do you have any other processes installed on this node that could result in data on disk being read/scanned, e.g. anti-virus or security software?
No.

any hint for me @Christian_Dahlqvist, @warkolm ?

Do you have any indices that are frequently queried that have more shards on the busy node than on other nodes? Are there any error messages in the logs on that node?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.