IOPS for a Data node is extremely high while other not

bugb · August 30, 2021, 3:33am

Version: Elastic Search 7.13
I am running 3 master nodes, 2 client nodes and 10 data nodes (199-399 Gb each node)
And one day I have seen the IOPS for a Data node is extremely high while other not

Could you please help?

warkolm · August 30, 2021, 10:06pm

Welcome to our community!

Is it causing issues?
What do the logs on the node show?

bugb · August 31, 2021, 1:15am

Dear Warkolm, we got some issues but not sure if it comes from the problem.

I can see a log like this:

[parent] Data too large, data for [<http_request>] would be [4085636510/3.8gb], which is larger than the limit of [4080218931/3.7gb], real usage: [4085363984/3.8gb], new bytes reserved: [272526/266.1kb], usages [request=220205/215kb, fielddata=0/0b, in_flight_requests=573853808/547.2mb, model_inference=0/0b, accounting=0/0b]\n    at onBody

warkolm · August 31, 2021, 2:34am

That is not related. To help figure it out, what is the output from the _cluster/stats?pretty&human API?

Christian_Dahlqvist · August 31, 2021, 6:51am

It looks like a lot of reading with very little writing so that likely rules out merging. Is your data evenly distributed across the cluster? Does this node hold any indices that are frequently queried? Do you have any other processes installed on this node that could result in data on disk being read/scanned, e.g. anti-virus or security software?

bugb · August 31, 2021, 7:36pm

Here is the output

{
  "_nodes": {
    "total": 10,
    "successful": 10,
    "failed": 0
  },
  "cluster_name": "prod-es-v7",
  "cluster_uuid": "rXIghHxHRTqlaQLoz6EccA",
  "timestamp": 1630438438358,
  "status": "green",
  "indices": {
    "count": 33,
    "shards": {
      "total": 190,
      "primaries": 95,
      "replication": 1,
      "index": {
        "shards": {
          "min": 2,
          "max": 24,
          "avg": 5.757575757575758
        },
        "primaries": {
          "min": 1,
          "max": 12,
          "avg": 2.878787878787879
        },
        "replication": {
          "min": 1,
          "max": 1,
          "avg": 1
        }
      }
    },
    "docs": {
      "count": 207831318,
      "deleted": 64153677
    },
    "store": {
      "size": "147.5gb",
      "size_in_bytes": 158450575176,
      "total_data_set_size": "147.5gb",
      "total_data_set_size_in_bytes": 158450575176,
      "reserved": "0b",
      "reserved_in_bytes": 0
    },
    "fielddata": {
      "memory_size": "59.1mb",
      "memory_size_in_bytes": 62016424,
      "evictions": 0
    },
    "query_cache": {
      "memory_size": "766.6mb",
      "memory_size_in_bytes": 803845834,
      "total_count": 54719348566,
      "hit_count": 11363845700,
      "miss_count": 43355502866,
      "cache_size": 179848,
      "cache_count": 367497739,
      "evictions": 367317891
    },
    "completion": {
      "size": "0b",
      "size_in_bytes": 0
    },
    "segments": {
      "count": 1445,
      "memory": "27mb",
      "memory_in_bytes": 28350294,
      "terms_memory": "13.7mb",
      "terms_memory_in_bytes": 14369176,
      "stored_fields_memory": "751.6kb",
      "stored_fields_memory_in_bytes": 769656,
      "term_vectors_memory": "0b",
      "term_vectors_memory_in_bytes": 0,
      "norms_memory": "1.8mb",
      "norms_memory_in_bytes": 1977792,
      "points_memory": "0b",
      "points_memory_in_bytes": 0,
      "doc_values_memory": "10.7mb",
      "doc_values_memory_in_bytes": 11233670,
      "index_writer_memory": "194.1mb",
      "index_writer_memory_in_bytes": 203574398,
      "version_map_memory": "3.5mb",
      "version_map_memory_in_bytes": 3695864,
      "fixed_bit_set": "189.5mb",
      "fixed_bit_set_memory_in_bytes": 198716080,
      "max_unsafe_auto_id_timestamp": 1630368006181,
      "file_sizes": {}
    },
    "mappings": {
      "field_types": [
        {
          "name": "boolean",
          "count": 68,
          "index_count": 26,
          "script_count": 0
        },
        {
          "name": "byte",
          "count": 4,
          "index_count": 2,
          "script_count": 0
        },
        {
          "name": "date",
          "count": 92,
          "index_count": 24,
          "script_count": 0
        },
        {
          "name": "float",
          "count": 74,
          "index_count": 9,
          "script_count": 0
        },
        {
          "name": "half_float",
          "count": 56,
          "index_count": 14,
          "script_count": 0
        },
        {
          "name": "integer",
          "count": 170,
          "index_count": 9,
          "script_count": 0
        },
        {
          "name": "keyword",
          "count": 743,
          "index_count": 28,
          "script_count": 0
        },
        {
          "name": "long",
          "count": 1286,
          "index_count": 26,
          "script_count": 0
        },
        {
          "name": "nested",
          "count": 41,
          "index_count": 11,
          "script_count": 0
        },
        {
          "name": "object",
          "count": 880,
          "index_count": 24,
          "script_count": 0
        },
        {
          "name": "scaled_float",
          "count": 378,
          "index_count": 2,
          "script_count": 0
        },
        {
          "name": "short",
          "count": 2,
          "index_count": 2,
          "script_count": 0
        },
        {
          "name": "text",
          "count": 451,
          "index_count": 19,
          "script_count": 0
        }
      ],
      "runtime_field_types": []
    },
    "analysis": {
      "char_filter_types": [
        {
          "name": "mapping",
          "count": 6,
          "index_count": 6
        }
      ],
      "tokenizer_types": [
        {
          "name": "edge_ngram",
          "count": 8,
          "index_count": 6
        },
        {
          "name": "ngram",
          "count": 4,
          "index_count": 2
        }
      ],
      "filter_types": [
        {
          "name": "edge_ngram",
          "count": 4,
          "index_count": 4
        },
        {
          "name": "shingle",
          "count": 4,
          "index_count": 4
        }
      ],
      "analyzer_types": [
        {
          "name": "custom",
          "count": 58,
          "index_count": 6
        }
      ],
      "built_in_char_filters": [],
      "built_in_tokenizers": [
        {
          "name": "icu_tokenizer",
          "count": 34,
          "index_count": 6
        },
        {
          "name": "keyword",
          "count": 2,
          "index_count": 2
        },
        {
          "name": "standard",
          "count": 4,
          "index_count": 4
        },
        {
          "name": "whitespace",
          "count": 6,
          "index_count": 6
        }
      ],
      "built_in_filters": [
        {
          "name": "asciifolding",
          "count": 2,
          "index_count": 2
        },
        {
          "name": "lowercase",
          "count": 56,
          "index_count": 6
        },
        {
          "name": "reverse",
          "count": 12,
          "index_count": 6
        },
        {
          "name": "stop",
          "count": 12,
          "index_count": 6
        }
      ],
      "built_in_analyzers": []
    },
    "versions": [
      {
        "version": "7.13.0",
        "index_count": 33,
        "primary_shard_count": 95,
        "total_primary_size": "74.3gb",
        "total_primary_bytes": 79793397268
      }
    ]
  },
  "nodes": {
    "count": {
      "total": 10,
      "coordinating_only": 0,
      "data": 5,
      "data_cold": 5,
      "data_content": 5,
      "data_frozen": 5,
      "data_hot": 5,
      "data_warm": 5,
      "ingest": 0,
      "master": 3,
      "ml": 10,
      "remote_cluster_client": 10,
      "transform": 5,
      "voting_only": 0
    },
    "versions": [
      "7.13.0"
    ],
    "os": {
      "available_processors": 50,
      "allocated_processors": 50,
      "names": [
        {
          "name": "Linux",
          "count": 10
        }
      ],
      "pretty_names": [
        {
          "pretty_name": "CentOS Linux 8",
          "count": 10
        }
      ],
      "architectures": [
        {
          "arch": "amd64",
          "count": 10
        }
      ],
      "mem": {
        "total": "188.9gb",
        "total_in_bytes": 202899456000,
        "free": "18.9gb",
        "free_in_bytes": 20378656768,
        "used": "169.9gb",
        "used_in_bytes": 182520799232,
        "free_percent": 10,
        "used_percent": 90
      }
    },
    "process": {
      "cpu": {
        "percent": 112
      },
      "open_file_descriptors": {
        "min": 507,
        "max": 1477,
        "avg": 849
      }
    },
    "jvm": {
      "max_uptime": "53.7d",
      "max_uptime_in_millis": 4646687312,
      "versions": [
        {
          "version": "16",
          "vm_name": "OpenJDK 64-Bit Server VM",
          "vm_version": "16+36",
          "vm_vendor": "AdoptOpenJDK",
          "bundled_jdk": true,
          "using_bundled_jdk": true,
          "count": 10
        }
      ],
      "mem": {
        "heap_used": "46.1gb",
        "heap_used_in_bytes": 49536569584,
        "heap_max": "100gb",
        "heap_max_in_bytes": 107374182400
      },
      "threads": 799
    },
    "fs": {
      "total": "1.6tb",
      "total_in_bytes": 1790086451200,
      "free": "1.4tb",
      "free_in_bytes": 1624459235328,
      "available": "1.3tb",
      "available_in_bytes": 1533291859968
    },
    "plugins": [
      {
        "name": "analysis-phonetic",
        "version": "7.13.0",
        "elasticsearch_version": "7.13.0",
        "java_version": "1.8",
        "description": "The Phonetic Analysis plugin integrates phonetic token filter analysis with elasticsearch.",
        "classname": "org.elasticsearch.plugin.analysis.AnalysisPhoneticPlugin",
        "extended_plugins": [],
        "has_native_controller": false,
        "licensed": false,
        "type": "isolated"
      },
      {
        "name": "analysis-kuromoji",
        "version": "7.13.0",
        "elasticsearch_version": "7.13.0",
        "java_version": "1.8",
        "description": "The Japanese (kuromoji) Analysis plugin integrates Lucene kuromoji analysis module into elasticsearch.",
        "classname": "org.elasticsearch.plugin.analysis.kuromoji.AnalysisKuromojiPlugin",
        "extended_plugins": [],
        "has_native_controller": false,
        "licensed": false,
        "type": "isolated"
      },
      {
        "name": "analysis-icu",
        "version": "7.13.0",
        "elasticsearch_version": "7.13.0",
        "java_version": "1.8",
        "description": "The ICU Analysis plugin integrates the Lucene ICU module into Elasticsearch, adding ICU-related analysis components.",
        "classname": "org.elasticsearch.plugin.analysis.icu.AnalysisICUPlugin",
        "extended_plugins": [],
        "has_native_controller": false,
        "licensed": false,
        "type": "isolated"
      },
      {
        "name": "analysis-ukrainian",
        "version": "7.13.0",
        "elasticsearch_version": "7.13.0",
        "java_version": "1.8",
        "description": "The Ukrainian Analysis plugin integrates the Lucene UkrainianMorfologikAnalyzer into elasticsearch.",
        "classname": "org.elasticsearch.plugin.analysis.ukrainian.AnalysisUkrainianPlugin",
        "extended_plugins": [],
        "has_native_controller": false,
        "licensed": false,
        "type": "isolated"
      },
      {
        "name": "discovery-ec2",
        "version": "7.13.0",
        "elasticsearch_version": "7.13.0",
        "java_version": "1.8",
        "description": "The EC2 discovery plugin allows to use AWS API for the unicast discovery mechanism.",
        "classname": "org.elasticsearch.discovery.ec2.Ec2DiscoveryPlugin",
        "extended_plugins": [],
        "has_native_controller": false,
        "licensed": false,
        "type": "isolated"
      },
      {
        "name": "analysis-stempel",
        "version": "7.13.0",
        "elasticsearch_version": "7.13.0",
        "java_version": "1.8",
        "description": "The Stempel (Polish) Analysis plugin integrates Lucene stempel (polish) analysis module into elasticsearch.",
        "classname": "org.elasticsearch.plugin.analysis.stempel.AnalysisStempelPlugin",
        "extended_plugins": [],
        "has_native_controller": false,
        "licensed": false,
        "type": "isolated"
      }
    ],
    "network_types": {
      "transport_types": {
        "netty4": 10
      },
      "http_types": {
        "netty4": 10
      }
    },
    "discovery_types": {
      "zen": 10
    },
    "packaging_types": [
      {
        "flavor": "default",
        "type": "docker",
        "count": 10
      }
    ],
    "ingest": {
      "number_of_pipelines": 0,
      "processor_stats": {}
    }
  }
}

Here is info for my cluster

iip           heap.percent ram.percent cpu load_1m load_5m load_15m node.role master
x.x.x.x            7          62   4    0.06    0.03     0.00 lmr       *     
x.x.x.x           59          62   0    0.00    0.01     0.00 lmr       -     
x.x.x.x           50          97  18    2.08    2.27     2.36 cdfhlrstw -     
x.x.x.x           37          95  20    2.57    2.48     2.39 cdfhlrstw -     
x.x.x.x           37          64  13    0.18    0.27     0.27 lr        -     
x.x.x.x           49          95  32    1.65    1.91     2.09 cdfhlrstw -     
x.x.x.x           19          98  42    2.62    2.55     2.59 cdfhlrstw -     
x.x.x.x           58          64  13    0.38    0.30     0.24 lr        -     
x.x.x.x           23          98  20    2.77    2.95     3.04 cdfhlrstw -     
x.x.x.x           34          61   0    0.00    0.00     0.00 lmr       -          -

Cluster heath:

epoch      timestamp cluster               status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1630439287 19:48:07  mycluster             green          10         5    190  95    0    0        0             0                  -                100.0%

bugb · August 31, 2021, 7:41pm

Is your data evenly distributed across the cluster?
What do you mean? I have run a cluster with 3 master nodes, 2 client nodes and around 10 data nodes and I can confirm that all the data nodes have data but I do not know why some data nodes have bigger size (Elasticsearch data) than others

Does this node hold any indices that are frequently queried?
How can I check that?

Do you have any other processes installed on this node that could result in data on disk being read/scanned, e.g. anti-virus or security software?
No.

bugb · September 12, 2021, 12:00pm

any hint for me @Christian_Dahlqvist, @warkolm ?

Christian_Dahlqvist · September 12, 2021, 4:57pm

Do you have any indices that are frequently queried that have more shards on the busy node than on other nodes? Are there any error messages in the logs on that node?

system · October 10, 2021, 4:58pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Performance problem because of read IOPS increase Elasticsearch	5	601	March 12, 2024
Data node high CPU Elasticsearch	19	3646	February 26, 2018
Heavy IO on Master-data node Elasticsearch	5	944	August 30, 2019
What causes high node Heap Memory more than 75%? Elasticsearch	9	665	June 17, 2019
High CPU on Data Nodes Elasticsearch	3	491	June 17, 2020

IOPS for a Data node is extremely high while other not

Related topics