Es update document very slow

Now it even takes about 7 minutes to update a doc.
Single node, heap size: 8G, 30 shards, a total of 6.2G data, 2 million doc. This is too slow, if anyone can tell me how to solve it, I would appreciate it

iostat

Linux 4.15.0-20-generic (max-master) 	12/18/2020 	_x86_64_	(32 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          15.46    0.00    8.95    0.09    0.00   75.51

Device             tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda             222.31        92.21      5448.83   23575235 1393042693

cat _nodes/hot_threads

::: {es-node1}{VxAcSdn8QJKRBu705pFVDA}{ozapTgHcRDaIgu9Cb-tB9w}{10.244.0.19}{10.244.0.19:9300}{ml.machine_memory=66933485568, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}
   Hot threads at 2020-12-18T08:36:29.005Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
   
   100.4% (502ms out of 500ms) cpu usage by thread 'elasticsearch[es-node1][write][T#7]'
     2/10 snapshots sharing following 205 elements
       ...
22.7% (113.6ms out of 500ms) cpu usage by thread 'elasticsearch[es-node1][write][T#4]'
     6/10 snapshots sharing following 192 elements
       ...

cat /_nodes/stats/thread_pool?human&pretty

// 20201218163630
// http://10.0.17.21:9200/_nodes/stats/thread_pool?human&pretty

{
  "_nodes": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "cluster_name": "sd",
  "nodes": {
    "VxAcSdn8QJKRBu705pFVDA": {
      "timestamp": 1608280589005,
      "name": "es-node1",
      "transport_address": "10.244.0.19:9300",
      "host": "10.244.0.19",
      "ip": "10.244.0.19:9300",
      "roles": [
        "master",
        "data",
        "ingest"
      ],
      "attributes": {
        "ml.machine_memory": "66933485568",
        "xpack.installed": "true",
        "ml.max_open_jobs": "20",
        "ml.enabled": "true"
      },
      "thread_pool": {
        "analyze": {
          "threads": 0,
          "queue": 0,
          "active": 0,
          "rejected": 0,
          "largest": 0,
          "completed": 0
        },
        "ccr": {
          "threads": 0,
          "queue": 0,
          "active": 0,
          "rejected": 0,
          "largest": 0,
          "completed": 0
        },
        "fetch_shard_started": {
          "threads": 1,
          "queue": 0,
          "active": 0,
          "rejected": 0,
          "largest": 16,
          "completed": 167
        },
        "fetch_shard_store": {
          "threads": 0,
          "queue": 0,
          "active": 0,
          "rejected": 0,
          "largest": 0,
          "completed": 0
        },
        "flush": {
          "threads": 1,
          "queue": 0,
          "active": 0,
          "rejected": 0,
          "largest": 4,
          "completed": 368
        },
        "force_merge": {
          "threads": 0,
          "queue": 0,
          "active": 0,
          "rejected": 0,
          "largest": 0,
          "completed": 0
        },
        "generic": {
          "threads": 24,
          "queue": 0,
          "active": 1,
          "rejected": 0,
          "largest": 24,
          "completed": 10495
        },
        "get": {
          "threads": 8,
          "queue": 0,
          "active": 0,
          "rejected": 0,
          "largest": 8,
          "completed": 389
        },
        "index": {
          "threads": 0,
          "queue": 0,
          "active": 0,
          "rejected": 0,
          "largest": 0,
          "completed": 0
        },
        "listener": {
          "threads": 3,
          "queue": 0,
          "active": 0,
          "rejected": 0,
          "largest": 3,
          "completed": 3
        },
        "management": {
          "threads": 5,
          "queue": 0,
          "active": 2,
          "rejected": 0,
          "largest": 5,
          "completed": 7194
        },
        "ml_autodetect": {
          "threads": 0,
          "queue": 0,
          "active": 0,
          "rejected": 0,
          "largest": 0,
          "completed": 0
        },
        "ml_datafeed": {
          "threads": 0,
          "queue": 0,
          "active": 0,
          "rejected": 0,
          "largest": 0,
          "completed": 0
        },
        "ml_utility": {
          "threads": 1,
          "queue": 0,
          "active": 0,
          "rejected": 0,
          "largest": 1,
          "completed": 1
        },
        "refresh": {
          "threads": 4,
          "queue": 0,
          "active": 0,
          "rejected": 0,
          "largest": 4,
          "completed": 51313
        },
        "rollup_indexing": {
          "threads": 0,
          "queue": 0,
          "active": 0,
          "rejected": 0,
          "largest": 0,
          "completed": 0
        },
        "search": {
          "threads": 13,
          "queue": 0,
          "active": 1,
          "rejected": 0,
          "largest": 13,
          "completed": 217319
        },
        "search_throttled": {
          "threads": 0,
          "queue": 0,
          "active": 0,
          "rejected": 0,
          "largest": 0,
          "completed": 0
        },
        "security-token-key": {
          "threads": 0,
          "queue": 0,
          "active": 0,
          "rejected": 0,
          "largest": 0,
          "completed": 0
        },
        "snapshot": {
          "threads": 0,
          "queue": 0,
          "active": 0,
          "rejected": 0,
          "largest": 0,
          "completed": 0
        },
        "warmer": {
          "threads": 4,
          "queue": 0,
          "active": 0,
          "rejected": 0,
          "largest": 4,
          "completed": 399
        },
        "watcher": {
          "threads": 0,
          "queue": 0,
          "active": 0,
          "rejected": 0,
          "largest": 0,
          "completed": 0
        },
        "write": {
          "threads": 8,
          "queue": 0,
          "active": 6,
          "rejected": 0,
          "largest": 8,
          "completed": 1401
        }
      }
    }
  }
}

What is the size and complexity of the document(s)? Are you using nested mappings?

Which version of Elasticsearch are you using? What is the load on the cluster?

What is the hardware specification of the host you are running Elasticsearch on? What type of storage are you using?

What is the output of the cluster stats API?

A document is about 20KB, the entire single node a total of 6.2G data, 2 million doc, without using nested mappings

ElasticSearch versoin: 6.8.6,The entire cluster has a high load, but not much CPU and memory usage


It runs Elasticsearch as a docker container. The host's cpu is 16 cores, 32 threads, and 64G memory. 2T HDD

_cluster/stats:


{
  "_nodes": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "cluster_name": "zs",
  "cluster_uuid": "AyUf9oysQJWFzx4pRMUuNw",
  "timestamp": 1608381130581,
  "status": "yellow",
  "indices": {
    "count": 47,
    "shards": {
      "total": 167,
      "primaries": 167,
      "replication": 0.0,
      "index": {
        "shards": {
          "min": 1,
          "max": 5,
          "avg": 3.5531914893617023
        },
        "primaries": {
          "min": 1,
          "max": 5,
          "avg": 3.5531914893617023
        },
        "replication": {
          "min": 0.0,
          "max": 0.0,
          "avg": 0.0
        }
      }
    },
    "docs": {
      "count": 27667108,
      "deleted": 117889
    },
    "store": {
      "size_in_bytes": 7299417551
    },
    "fielddata": {
      "memory_size_in_bytes": 672736,
      "evictions": 0
    },
    "query_cache": {
      "memory_size_in_bytes": 1124944,
      "total_count": 4706044,
      "hit_count": 733504,
      "miss_count": 3972540,
      "cache_size": 2299,
      "cache_count": 2322,
      "evictions": 23
    },
    "completion": {
      "size_in_bytes": 0
    },
    "segments": {
      "count": 1090,
      "memory_in_bytes": 61900764,
      "terms_memory_in_bytes": 52703930,
      "stored_fields_memory_in_bytes": 2370744,
      "term_vectors_memory_in_bytes": 0,
      "norms_memory_in_bytes": 3521600,
      "points_memory_in_bytes": 364378,
      "doc_values_memory_in_bytes": 2940112,
      "index_writer_memory_in_bytes": 0,
      "version_map_memory_in_bytes": 0,
      "fixed_bit_set_memory_in_bytes": 72232,
      "max_unsafe_auto_id_timestamp": 1594039534002,
      "file_sizes": {
        
      }
    }
  },
  "nodes": {
    "count": {
      "total": 1,
      "data": 1,
      "coordinating_only": 0,
      "master": 1,
      "ingest": 1
    },
    "versions": [
      "6.8.6"
    ],
    "os": {
      "available_processors": 16,
      "allocated_processors": 16,
      "names": [
        {
          "name": "Linux",
          "count": 1
        }
      ],
      "pretty_names": [
        {
          "pretty_name": "CentOS Linux 7 (Core)",
          "count": 1
        }
      ],
      "mem": {
        "total_in_bytes": 66933485568,
        "free_in_bytes": 12921643008,
        "used_in_bytes": 54011842560,
        "free_percent": 19,
        "used_percent": 81
      }
    },
    "process": {
      "cpu": {
        "percent": 0
      },
      "open_file_descriptors": {
        "min": 4973,
        "max": 4973,
        "avg": 4973
      }
    },
    "jvm": {
      "max_uptime_in_millis": 95129535,
      "versions": [
        {
          "version": "13.0.1",
          "vm_name": "OpenJDK 64-Bit Server VM",
          "vm_version": "13.0.1+9",
          "vm_vendor": "AdoptOpenJDK",
          "count": 1
        }
      ],
      "mem": {
        "heap_used_in_bytes": 4278952440,
        "heap_max_in_bytes": 8476557312
      },
      "threads": 182
    },
    "fs": {
      "total_in_bytes": 2830734360576,
      "free_in_bytes": 2575955050496,
      "available_in_bytes": 2432089849856
    },
    "plugins": [
      
    ],
    "network_types": {
      "transport_types": {
        "security4": 1
      },
      "http_types": {
        "security4": 1
      }
    }
  }
}

If you are using spinning disk and see high load, have a look at disk I/O and iowait, e.g. using iostat -x. It is quite possible that you are limited by disk I/O. How do you perform the update? Do you force a refresh?

iostat -x:

Linux 4.15.0-20-generic (max-master) 	12/19/2020 	_x86_64_	(32 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          59.95    0.00    4.57    0.02    0.00   35.46

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda              4.90  103.64     75.21   5936.18     0.36    51.40   6.88  33.15    1.04    4.76   0.50    15.34    57.28   0.09   0.99

There are 20 threads that update data through the bulk api, and each bulk has about 120 updae. Because our system has high real-time requirements, it must be refreshed after each bulk.

Is that captured while you are experiencing slow updates? Can you run it with the -d flag as well to capture a few measurements?

I just discovered that one of the documents is about 100MB, the whole process will have about 20 threads, a total of 6000 update requests by bulk api will be sent, and the large document will be updated in each bulk.

Is it because of this large document that the bulk update is very slow?

Most likely.

Are there any good optimization suggestions for this kind of large document update operation?

Not that I know of. Check the logs for signs of long or frequent GC, which could indicate you need a larger heap.

Frequently updating the same document(s) is generally very slow as it results in many small and expensive refreshes and corresponding merges. I believe this was optimized in a recent version so I would recommend upgrading to the latest 7.x version. Given that you are issuing refreshes after each operation you may be paying this performance penalty even with a newer version though.

I truly appreciate your timely help

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.