Disappearing Documents on Batch Upload with Enrich Policy

Hey folks,

My team and I are evaluating some interesting behavior when our system attempts to batch upload documents to our Elasticsearch store. We are noticing that occasionally a collection of documents that we expect the system to upload is failing to upload them.

I was hoping that the community could offer insight into how our system can capture (more) information regarding failed to upload documents. With this telemetry in place, I'm hopeful that our team can discover what the issue might be.
Outside of capturing failed to ingest documents, does elastic offer ways to look at a summary of all documents uploaded during a batch upload operation?

To upload the documents, our system uses the NEST client. As part of this, the system offers a callback for when documents are "dropped" via the DroppedDocumentCallback property on the BulkAllRequest<T> class. As part of this callback the system logs statistics on the dropped documents, although there are never any dropped documents reported at this moment.

Below I have offered context on the batch upload, but please request more information as needed.

Thanks,
CJ

The source index from the enrich policy: (unnecessary fields omitted for brevity)

{
  "node_attributes-72cdb71d-eb84-4bf1-85e0-80ffbb67a74e" : {
    "aliases" : {
      "node_attributes" : { }
    },
    "mappings" : {
      "properties" : {
        "clusterName" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "nodeId" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "routing" : {
          "allocation" : {
            "include" : {
              "_tier_preference" : "data_content"
            }
          }
        },
        "number_of_shards" : "1",
        "provided_name" : "node_attributes-72cdb71d-eb84-4bf1-85e0-80ffbb67a74e",
        "max_result_window" : "12000",
        "creation_date" : "1678364259267",
        "number_of_replicas" : "0",
        "uuid" : "D341mlLgQqWdSxkgirawJw",
        "version" : {
          "created" : "7170599"
        }
      }
    }
  }
}

Target Index:

{
  "node_availability-01510309512023" : {
    "aliases" : {
      "node_availability" : { }
    },
    "mappings" : {
      "properties" : {
        "attributes" : {
          "properties" : {
          // and other properties omitted for brevity
            "nodeId" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
          }
        },
        "clusterName" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "diskConfiguration" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "isAvailable" : {
          "type" : "boolean"
        },
        "isContainerCountZero" : {
          "type" : "boolean"
        },
        "nodeId" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "routing" : {
          "allocation" : {
            "include" : {
              "_tier_preference" : "data_content"
            }
          }
        },
        "number_of_shards" : "1",
        "provided_name" : "node_availability-01510309512023",
        "max_result_window" : "12000",
        "creation_date" : "1678377063311",
        "number_of_replicas" : "0",
        "uuid" : "z6b9ZZ9cT_yufzDpWqWEIg",
        "version" : {
          "created" : "7170599"
        }
      }
    }
  }
}

Enrich Policy:

{
      "config" : {
        "match" : {
          "name" : "node_enrich_policy",
          "indices" : [
            "node_attributes"
          ],
          "match_field" : "nodeId",
          "enrich_fields" : [
            "rackLocation",
            "machinePool",
            "hardwareSku",
            "bios",
            "os",
            "bmc",
            "microcode",
            "cpuId",
            "cpuDescription",
            "ssdModels",
            "ssdFirmware",
            "socFirmware",
            "nitroFirmware",
            "cerberusVersion",
            "region",
            "isUtilitySku",
            "isStorageSku",
            "dataCenter",
            "gpuFirmware",
            "nvSwitchFirmware",
            "cecFirmware",
            "gpuFpgaFirmware",
            "retimerFirmware"
          ]
        }
      }
    }

Ingest Pipeline Definition:

"node_ingest" : {
    "processors" : [
      {
        "enrich" : {
          "policy_name" : "node_enrich_policy",
          "field" : "nodeId",
          "target_field" : "attributes",
          "on_failure" : [
            {
              "drop" : {
                "description" : "Drop Documents that were failed to be enriched."
              }
            }
          ]
        }
      }
    ]
  },

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.