CPU loading: 100% under load after 10-12 hrs of usage

Hello, we're switching from ES 5.5 to 7.12. After applying it works perfect for about 10-12 hrs: and then we get CPU loading: 100%
Even simple queries: GET /products/_search?size=0 works for 5-6 seconds.
And it returns to the normal state in a couple of seconds when we switch back.

ES version: 7.12
number_of_shards: 18
number_of_replicas: 2
nodes number: 9
index size: 267 Gb
RAM: 32GB
Xms16g
Xmx16g

CPUs: 8
Number of documents: 9581754
50-80 documents indexing per second (no bulk used)

mapping

    {
      "products" : {
        "mappings" : {
          "dynamic" : "strict",
          "properties" : {
            "categories" : {
              "type" : "long"
            },
            "categoryId" : {
              "type" : "long"
            },
            "ean" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword"
                }
              }
            },
            "features" : {
              "type" : "nested",
              "dynamic" : "strict",
              "properties" : {
                "categoryFeatureId" : {
                  "type" : "long"
                },
                "value" : {
                  "type" : "keyword",
                  "eager_global_ordinals" : true
                }
              }
            },
            "localizedData" : {
              "type" : "nested",
              "dynamic" : "strict",
              "properties" : {
                "category" : {
                  "type" : "text",
                  "fields" : {
                    "keyword" : {
                      "type" : "keyword"
                    }
                  }
                },
                "languageShortCode" : {
                  "type" : "keyword"
                },
                "lifeCycle" : {
                  "type" : "date",
                  "format" : "yyyy-MM-dd HH:mm:ss"
                },
                "supplier" : {
                  "type" : "text",
                  "fields" : {
                    "keyword" : {
                      "type" : "keyword"
                    }
                  }
                },
                "title" : {
                  "type" : "text"
                },
              }
            },
            "mpn" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword"
                }
              },
              "analyzer" : "autoCompleteAnalyzer"
            },
            "numericalFeatures" : {
              "type" : "nested",
              "dynamic" : "strict",
              "properties" : {
                "categoryFeatureId" : {
                  "type" : "long"
                },
                "value" : {
                  "type" : "float"
                }
              }
            },
            "price" : {
              "type" : "float"
            },
            "supplierId" : {
              "type" : "long"
            }
          }
        }
      }
    }

two typical queries used for calculating filters and for getting filtered results

    {
      "aggs": {
        "all": {
          "global": {},
          "aggs": {
            "main": {
              "filter": {
                "bool": {
                  "must": [
                    {
                      "nested": {
                        "path": "localizedData",
                        "query": {
                          "bool": {
                            "must": [
                              {
                                "term": {
                                  "localizedData.languageShortCode": {
                                    "value": "en"
                                  }
                                }
                              },
                              {
                                "bool": {
                                  "should": [
                                    {
                                      "range": {
                                        "localizedData.lifeCycle": {
                                          "lte": "2021-04-23 08:23:32"
                                        }
                                      }
                                    },
                                    {
                                      "bool": {
                                        "must_not": {
                                          "exists": {
                                            "field": "localizedData.lifeCycle"
                                          }
                                        }
                                      }
                                    }
                                  ]
                                }
                              }
                            ]
                          }
                        }
                      }
                    }
                  ]
                }
              },
              "aggs": {
                "general": {
                  "filter": {
                    "bool": {
                      "must": [
                        {
                          "nested": {
                            "path": "features",
                            "query": {
                              "bool": {
                                "must": [
                                  {
                                    "match": {
                                      "features.categoryFeatureId": "36303"
                                    }
                                  },
                                  {
                                    "match": {
                                      "features.value": "y"
                                    }
                                  }
                                ]
                              }
                            }
                          }
                        },
                        {
                          "bool": {
                            "should": [
                              {
                                "term": {
                                  "categoryId": {
                                    "value": 151,
                                    "boost": 1000
                                  }
                                }
                              },
                              {
                                "term": {
                                  "categories": {
                                    "value": 151
                                  }
                                }
                              }
                            ]
                          }
                        }
                      ]
                    }
                  },
                  "aggs": {
                    "price": {
                      "stats": {
                        "field": "price"
                      }
                    },
                    "cat": {
                      "terms": {
                        "field": "categories",
                        "size": 30
                      }
                    },
                    "man": {
                      "terms": {
                        "field": "supplierId",
                        "size": 30
                      }
                    },
                    "nested_features": {
                      "nested": {
                        "path": "features"
                      },
                      "aggs": {
                        "featureId": {
                          "terms": {
                            "field": "features.categoryFeatureId",
                            "size": 100
                          },
                          "aggs": {
                            "featureVal": {
                              "terms": {
                                "field": "features.value",
                                "size": 100
                              }
                            }
                          }
                        }
                      }
                    },
                    "nested_numericalFeatures": {
                      "nested": {
                        "path": "numericalFeatures"
                      },
                      "aggs": {
                        "featureId": {
                          "terms": {
                            "field": "numericalFeatures.categoryFeatureId",
                            "size": 100
                          },
                          "aggs": {
                            "featureVal": {
                              "stats": {
                                "field": "numericalFeatures.value"
                              }
                            }
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
    {
      "query": {
        "function_score": {
          "query": {
            "bool": {
              "must": [
                {
                  "nested": {
                    "path": "localizedData",
                    "query": {
                      "bool": {
                        "must": [
                          {
                            "term": {
                              "localizedData.languageShortCode": {
                                "value": "en"
                              }
                            }
                          },
                          {
                            "bool": {
                              "should": [
                                {
                                  "range": {
                                    "localizedData.lifeCycle": {
                                      "lte": "2021-04-23 08:26:21"
                                    }
                                  }
                                },
                                {
                                  "bool": {
                                    "must_not": {
                                      "exists": {
                                        "field": "localizedData.lifeCycle"
                                      }
                                    }
                                  }
                                }
                              ]
                            }
                          }
                        ]
                      }
                    }
                  }
                },
                {
                  "nested": {
                    "path": "features",
                    "query": {
                      "bool": {
                        "must": [
                          {
                            "match": {
                              "features.categoryFeatureId": "36303"
                            }
                          },
                          {
                            "match": {
                              "features.value": "y"
                            }
                          }
                        ]
                      }
                    }
                  }
                },
                {
                  "bool": {
                    "should": [
                      {
                        "term": {
                          "categoryId": {
                            "value": 151,
                            "boost": 1000
                          }
                        }
                      },
                      {
                        "term": {
                          "categories": {
                            "value": 151
                          }
                        }
                      }
                    ]
                  }
                }
              ]
            }
          },
          "functions": [
            {
              "filter": {
                "nested": {
                  "path": "features",
                  "query": {
                    "exists": {
                      "field": "features.categoryFeatureId"
                    }
                  }
                }
              },
              "weight": 50
            },
            {
              "filter": {
                "match": {
                  "quality": 2
                }
              },
              "weight": 20
            },
            {
              "filter": {
                "match": {
                  "quality": 1
                }
              },
              "weight": 10
            }
          ],
          "boost_mode": "sum"
        }
      }
    }

show stats uploaded

Please don't post pictures of text, they are difficult to read, impossible to search and replicate (if it's code), and some people may not be even able to see them :slight_smile:

Switch back to what exactly?

What is the output from the _cluster/stats?pretty&human API?

Thank Mark for reply.

Sorry for images. It was cheating from my side. Thought it was a good idea to avoid the character number limitation in post by putting some text into image))

Switch back I mean switch to ES 5.5.

However there is one more detail. It really depends on how hard we 're adding and updating the documents. After stop adding new documents it returns to normal state. As I said earlier we don't use batch.

Average size of json for single document is 30-50kb

_cluster/stats?pretty&human

{
  "_nodes" : {
    "total" : 15,
    "successful" : 15,
    "failed" : 0
  },
  "cluster_name" : "fo-elastic-7",
  "cluster_uuid" : "iqK7WjT3Sly2xfpTanYxqA",
  "timestamp" : 1619501194396,
  "status" : "green",
  "indices" : {
    "count" : 10,
    "shards" : {
      "total" : 122,
      "primaries" : 43,
      "replication" : 1.8372093023255813,
      "index" : {
        "shards" : {
          "min" : 2,
          "max" : 54,
          "avg" : 12.2
        },
        "primaries" : {
          "min" : 1,
          "max" : 18,
          "avg" : 4.3
        },
        "replication" : {
          "min" : 1.0,
          "max" : 2.0,
          "avg" : 1.3
        }
      }
    },
    "docs" : {
      "count" : 802738940,
      "deleted" : 156500494
    },
    "store" : {
      "size" : "852.3gb",
      "size_in_bytes" : 915170532682,
      "reserved" : "0b",
      "reserved_in_bytes" : 0
    },
    "fielddata" : {
      "memory_size" : "3.1mb",
      "memory_size_in_bytes" : 3336504,
      "evictions" : 0
    },
    "query_cache" : {
      "memory_size" : "949.5mb",
      "memory_size_in_bytes" : 995719865,
      "total_count" : 17039615,
      "hit_count" : 349049,
      "miss_count" : 16690566,
      "cache_size" : 9562,
      "cache_count" : 9562,
      "evictions" : 0
    },
    "completion" : {
      "size" : "69.7gb",
      "size_in_bytes" : 74904024220
    },
    "segments" : {
      "count" : 1546,
      "memory" : "69.7gb",
      "memory_in_bytes" : 74919351030,
      "terms_memory" : "69.7gb",
      "terms_memory_in_bytes" : 74910786972,
      "stored_fields_memory" : "951.8kb",
      "stored_fields_memory_in_bytes" : 974656,
      "term_vectors_memory" : "0b",
      "term_vectors_memory_in_bytes" : 0,
      "norms_memory" : "761.2kb",
      "norms_memory_in_bytes" : 779520,
      "points_memory" : "0b",
      "points_memory_in_bytes" : 0,
      "doc_values_memory" : "6.4mb",
      "doc_values_memory_in_bytes" : 6809882,
      "index_writer_memory" : "234.6mb",
      "index_writer_memory_in_bytes" : 246066754,
      "version_map_memory" : "10.6mb",
      "version_map_memory_in_bytes" : 11173852,
      "fixed_bit_set" : "342.2mb",
      "fixed_bit_set_memory_in_bytes" : 358903912,
      "max_unsafe_auto_id_timestamp" : -1,
      "file_sizes" : { }
    },
    "mappings" : {
      "field_types" : [
        {
          "name" : "boolean",
          "count" : 3,
          "index_count" : 2
        },
        {
          "name" : "completion",
          "count" : 1,
          "index_count" : 1
        },
        {
          "name" : "date",
          "count" : 12,
          "index_count" : 4
        },
        {
          "name" : "float",
          "count" : 2,
          "index_count" : 1
        },
        {
          "name" : "keyword",
          "count" : 39,
          "index_count" : 5
        },
        {
          "name" : "long",
          "count" : 16,
          "index_count" : 4
        },
        {
          "name" : "nested",
          "count" : 6,
          "index_count" : 2
        },
        {
          "name" : "object",
          "count" : 7,
          "index_count" : 2
        },
        {
          "name" : "text",
          "count" : 21,
          "index_count" : 5
        }
      ]
    },
    "analysis" : {
      "char_filter_types" : [ ],
      "tokenizer_types" : [
        {
          "name" : "edge_ngram",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "filter_types" : [ ],
      "analyzer_types" : [
        {
          "name" : "custom",
          "count" : 1,
          "index_count" : 1
        }
      ],
      "built_in_char_filters" : [ ],
      "built_in_tokenizers" : [ ],
      "built_in_filters" : [ ],
      "built_in_analyzers" : [
        {
          "name" : "standard",
          "count" : 1,
          "index_count" : 1
        }
      ]
    },
    "versions" : [
      {
        "version" : "7.12.0",
        "index_count" : 10,
        "primary_shard_count" : 43,
        "total_primary_size" : "284gb",
        "total_primary_bytes" : 304983067844
      }
    ]
  },
  "nodes" : {
    "count" : {
      "total" : 15,
      "coordinating_only" : 3,
      "data" : 9,
      "data_cold" : 0,
      "data_content" : 0,
      "data_frozen" : 0,
      "data_hot" : 0,
      "data_warm" : 0,
      "ingest" : 3,
      "master" : 3,
      "ml" : 0,
      "remote_cluster_client" : 0,
      "transform" : 0,
      "voting_only" : 0
    },
    "versions" : [
      "7.12.0"
    ],
    "os" : {
      "available_processors" : 102,
      "allocated_processors" : 102,
      "names" : [
        {
          "name" : "Linux",
          "count" : 15
        }
      ],
      "pretty_names" : [
        {
          "pretty_name" : "Debian GNU/Linux 10 (buster)",
          "count" : 15
        }
      ],
      "architectures" : [
        {
          "arch" : "amd64",
          "count" : 15
        }
      ],
      "mem" : {
        "total" : "352.2gb",
        "total_in_bytes" : 378177728512,
        "free" : "12.1gb",
        "free_in_bytes" : 13011701760,
        "used" : "340gb",
        "used_in_bytes" : 365166026752,
        "free_percent" : 3,
        "used_percent" : 97
      }
    },
    "process" : {
      "cpu" : {
        "percent" : 205
      },
      "open_file_descriptors" : {
        "min" : 612,
        "max" : 910,
        "avg" : 784
      }
    },
    "jvm" : {
      "max_uptime" : "6.8d",
      "max_uptime_in_millis" : 592985374,
      "versions" : [
        {
          "version" : "15.0.1",
          "vm_name" : "OpenJDK 64-Bit Server VM",
          "vm_version" : "15.0.1+9",
          "vm_vendor" : "AdoptOpenJDK",
          "bundled_jdk" : true,
          "using_bundled_jdk" : true,
          "count" : 15
        }
      ],
      "mem" : {
        "heap_used" : "113.2gb",
        "heap_used_in_bytes" : 121643914456,
        "heap_max" : "180gb",
        "heap_max_in_bytes" : 193273528320
      },
      "threads" : 997
    },
    "fs" : {
      "total" : "1.8tb",
      "total_in_bytes" : 2043039191040,
      "free" : "947.9gb",
      "free_in_bytes" : 1017861378048,
      "available" : "942.5gb",
      "available_in_bytes" : 1012068225024
    },
    "plugins" : [ ],
    "network_types" : {
      "transport_types" : {
        "security4" : 15
      },
      "http_types" : {
        "security4" : 15
      }
    },
    "discovery_types" : {
      "zen" : 15
    },
    "packaging_types" : [
      {
        "flavor" : "default",
        "type" : "deb",
        "count" : 15
      }
    ],
    "ingest" : {
      "number_of_pipelines" : 2,
      "processor_stats" : {
        "gsub" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        },
        "script" : {
          "count" : 0,
          "failed" : 0,
          "current" : 0,
          "time" : "0s",
          "time_in_millis" : 0
        }
      }
    }
  }
}

That's far from ideal then. I would start there.

Thank you Mark,

Bulk queries ~7MB of data increased our speed of indexing.
Unfortunately problem is not gone.
But it seems like outages not related (or little related) to adding/modifying data.

What we have there:
hourly CPU usage increased from 10% to 100% on data nodes.
After reloading php-fpms CPU's load became normal for 1 hour.
If we don't reload php-fpms CPU load returns to normal after 1.5-2h

Logs on the http nodes say us sticsearch.action.search.SearchPhaseExecutionException: all shards failed
curl /_cluster/health/?level=shards says us that all is green and started

"status":"green","primary_active":true,"active_shards":3,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":0

And Interesting that after reloading it works perfect for 1 hour. It makes me think that it's because some internal Elastic processes.

I would suggest using the latest 7.12.1, I've seen a few other topics with similar issues that an upgrade fixed.

Dear Mark,

Thanks a lot for your support. Unfortunately this didn't help.
We're going to play with mappings. And reduce number of nested objects.

Hello Mark and everyone interested in,

I'd like to share my experience. It may be helpful.
First of all when I figured out that every record in nested object indexed as separate document I tried to reduce number of nested in my mapping.

I didn't succeed with flattened type (aggregations were very slow).
I just replaced nested with array of keywords.

And as a result: total number of documents were reduced from ~900M to 41615457 (GET /products/_search show 10362012)

Index working like a charm now.

Thanks for support

1 Like

Thanks for updating the topic!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.