Batch Upload with Enrich Policy Suddenly taking much longer

Hello all,

My team has been running a single-vm elasticsearch instance for about a year and we are currently facing an issue when it comes to batch uploading documents, which is also assigned an enrichment policy.

To help in this discussion:
Index "E": The index that is referenced from an enrichment policy. This index is re-created every night. (~3.5 million docs with a total size of 1.3GB).
Index "D": The "dependent" index. The index that, when batch uploading, references the enrichment policy that uses index E. This index is recreated every 20 minutes. (~1.5 million docs with a total size of about 229 MB)

Again, we update index "E" every night. It has been noticed that shortly after the refreshment of index E, index D never completes its batch upload. (Hosted with Azure Function, so times out)

I know that this isn't an issue with Elasticsearch, and most likely not an issue with the configuration of the instance, since the only delta from operational to non-operational is the update of index E. My question is what I should be looking out for as I continue to triage this issue, in relation to the data?

Do duplicates in index E cause issues when upload documents to index D?
Do duplicates in index D cause issues when enriching from index E?
If there is little overlap between index D and E, does this cause an increase in runtime?
What other properties of the data stored in index D and index E could cause a sharp increase in ingest time of index D?
Is the enrich policy in a bad state?
Are there any elastic logs that may describe why the upload of documents to index D is taking so long?

Apologies for the fairly open-ended question, but I am currently at quite a loss, and I am really hoping for any help on the matter. :slight_smile:

Thank you,
CJ

(Also, when observing the state of the VM during the upload of D, the primitives [CPU %, memory, disk IO] look to be at ok levels.)

Here is the definition of the indices, enrich policy, etc.

// Index E:
{
  "node_attributes-39db4897-69ee-447f-ba98-f331fe6048f7" : {
    "aliases" : {
      "node_attributes" : { }
    },
    "mappings" : {
      "properties" : {
        "bios" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "bmc" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "cecFirmware" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "cerberusVersion" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "clusterName" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "cpuDescription" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "cpuId" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "dataCenter" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "gpuFirmware" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "gpuFpgaFirmware" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "hardwareSku" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "isStorageSku" : {
          "type" : "boolean"
        },
        "isUtilitySku" : {
          "type" : "boolean"
        },
        "machinePool" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "microcode" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "nitroFirmware" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "nodeId" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "nvSwitchFirmware" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "os" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "rackLocation" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "region" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "retimerFirmware" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "socFirmware" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "ssdFirmware" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "ssdModels" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "routing" : {
          "allocation" : {
            "include" : {
              "_tier_preference" : "data_content"
            }
          }
        },
        "number_of_shards" : "1",
        "provided_name" : "node_attributes-39db4897-69ee-447f-ba98-f331fe6048f7",
        "max_result_window" : "12000",
        "creation_date" : "1670242201793",
        "number_of_replicas" : "0",
        "uuid" : "Z47NyUMzT92HAqUaDfnDTA",
        "version" : {
          "created" : "7170599"
        }
      }
    }
  }
} 
// index D:
{
  "node_availability-05511203512022" : {
    "aliases" : {
      "node_availability" : { }
    },
    "mappings" : {
      "properties" : {
        "clusterName" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "diskConfiguration" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "isAvailable" : {
          "type" : "boolean"
        },
        "isContainerCountZero" : {
          "type" : "boolean"
        },
        "isTipEmpty" : {
          "type" : "boolean"
        },
        "nodeId" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "routing" : {
          "allocation" : {
            "include" : {
              "_tier_preference" : "data_content"
            }
          }
        },
        "number_of_shards" : "1",
        "provided_name" : "node_availability-05511203512022",
        "max_result_window" : "12000",
        "creation_date" : "1670071867819",
        "number_of_replicas" : "0",
        "uuid" : "SEayVfIxRhOSPKviXmqUWQ",
        "version" : {
          "created" : "7170599"
        }
      }
    }
  }
}
Enrich Policy:
{
      "config" : {
        "match" : {
          "name" : "node_enrich_policy",
          "indices" : [
            "node_attributes"
          ],
          "match_field" : "nodeId",
          "enrich_fields" : [
            "rackLocation",
            "machinePool",
            "hardwareSku",
            "bios",
            "os",
            "bmc",
            "microcode",
            "cpuId",
            "cpuDescription",
            "ssdModels",
            "ssdFirmware",
            "socFirmware",
            "nitroFirmware",
            "cerberusVersion",
            "region",
            "isUtilitySku",
            "isStorageSku",
            "dataCenter",
            "gpuFirmware",
            "nvSwitchFirmware",
            "cecFirmware",
            "gpuFpgaFirmware",
            "retimerFirmware"
          ]
        }
      }
    }

To follow up on this thread:
It was discovered that there were ~200 objects that had around 50 copies each in the index "D". I am not sure how this would affect the ingest time for index E, or affect the enrichment index E, but after removing the duplicate values, the batch upload seems to go back to normal execution time.

So although the removal of the duplicate values helped fix the issue, I am still trying to understand where the enrichment process fell short.

When I re-created the issue in our dev environment, I had found that the enrich policy is not defined in Elasticsearch after running: GET _cat/indices, although another index is found! This index is called unexpected

yellow open unexpected                                                                mTh1x92dQzCb52P0GfWTLA 1 1       1      0   18.3kb   18.3kb

Furthermore, this index has the exact same schema as the intended enrich policy. Which is leading me to believe that something is going wrong when executing the enrich policy.

Below is a snippet of the log file from the elasticsearch instance around the time it creates the enrich index.

Any suggestions on where else to look to figure out why the enrich index is not getting set up correctly?

[2022-12-07T00:01:27,308][INFO ][o.e.x.e.EnrichPolicyRunner] [junodev01elasticsearch] Policy [cluster_enrich_policy]: Running enrich policy
[2022-12-07T00:01:27,312][INFO ][o.e.c.m.MetadataCreateIndexService] [junodev01elasticsearch] [.enrich-cluster_enrich_policy-1670371287309] creating index, cause [api], templates [], shards [1]/[0]
[2022-12-07T00:01:27,939][INFO ][o.e.x.e.EnrichPolicyRunner] [junodev01elasticsearch] Policy [cluster_enrich_policy]: Transferred [7376] documents to enrich index [.enrich-cluster_enrich_policy-1670371287309]
[2022-12-07T00:01:28,187][INFO ][o.e.x.e.EnrichPolicyRunner] [junodev01elasticsearch] Policy [cluster_enrich_policy]: Policy execution complete
[2022-12-07T00:02:21,998][INFO ][o.e.c.m.MetadataDeleteIndexService] [junodev01elasticsearch] [.enrich-cluster_enrich_policy-1670198774201/wlgemRckTvyjEU8ozHpIRQ] deleting index
[2022-12-07T12:30:04,445][INFO ][o.e.x.e.EnrichPolicyRunner] [junodev01elasticsearch] Policy [node_enrich_policy]: Running enrich policy
[2022-12-07T12:30:04,448][INFO ][o.e.c.m.MetadataCreateIndexService] [junodev01elasticsearch] [.enrich-node_enrich_policy-1670416204446] creating index, cause [api], templates [], shards [1]/[0]
[2022-12-07T12:32:22,014][INFO ][o.e.c.m.MetadataDeleteIndexService] [junodev01elasticsearch] [.enrich-node_enrich_policy-1670416204446/oszKMliPQl2EQMHjqzrK3w] deleting index
[2022-12-07T12:32:22,197][INFO ][o.e.c.m.MetadataCreateIndexService] [junodev01elasticsearch] [.enrich-node_enrich_policy-1670416204446] creating index, cause [auto(bulk api)], templates [], shards [1]/[1]
[2022-12-07T12:32:22,487][INFO ][o.e.c.m.MetadataMappingService] [junodev01elasticsearch] [.enrich-node_enrich_policy-1670416204446/bXDyhdNqSRG9j_MJO9Sxiw] create_mapping [_doc]
[2022-12-07T12:35:54,509][INFO ][o.e.x.e.EnrichPolicyRunner] [junodev01elasticsearch] Policy [node_enrich_policy]: Transferred [3465265] documents to enrich index [.enrich-node_enrich_policy-1670416204446]
[2022-12-07T12:36:53,050][INFO ][o.e.c.r.a.AllocationService] [junodev01elasticsearch] updating number_of_replicas to [0] for indices [.enrich-node_enrich_policy-1670416204446]
[2022-12-07T12:36:53,375][INFO ][o.e.x.e.EnrichPolicyRunner] [junodev01elasticsearch] Policy [node_enrich_policy]: Policy execution complete
[2022-12-07T12:47:22,015][INFO ][o.e.c.m.MetadataDeleteIndexService] [junodev01elasticsearch] [.enrich-node_enrich_policy-1670416204446/bXDyhdNqSRG9j_MJO9Sxiw] deleting index
[2022-12-07T12:47:22,015][INFO ][o.e.c.m.MetadataDeleteIndexService] [junodev01elasticsearch] [.enrich-node_enrich_policy-1670353673539/GFc1rhp8R2mN_nARbfxoPw] deleting index

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.