Enrich processor missing some documents

Hello,

I have an ingest pipeline that has a simple enrich processor, it basically adds a field named host.type based on the host.name.

This enrich processor is part of a final_pipeline that is set in the index, and I'm sure that this pipeline is executed, as the first processor creates a new field that is present in every document.

This is the final pipeline

{
  "description": "crowdstrike final pipeline",
  "processors": [
    {
      "pipeline": {
        "name": "crowdstrike-set-company",
        "ignore_failure": true
      }
    },
    {
      "enrich": {
        "field": "host.name",
        "policy_name": "crowdstrike-host-type",
        "target_field": "enrich",
        "ignore_missing": true,
        "description": "adiciona 'enrich' de acordo com o tipo do host"
      }
    },
    {
      "rename": {
        "field": "enrich.host.type",
        "target_field": "host.type",
        "ignore_missing": true,
        "if": "ctx.enrich?.host?.type != null"
      }
    },
    {
      "remove": {
        "field": "enrich.host.name",
        "ignore_missing": true
      }
    },
    {
      "set": {
        "field": "host.type",
        "value": "UNKNOWN",
        "override": false,
        "if": "ctx.host?.type == null"
      }
    }
  ],
  "on_failure": [
    {
      "set": {
        "field": "error.message",
        "value": "{{ _ingest.on_failure_message }}"
      }
    }
  ]
}

As you can see, I have a fail back to add the host.type field if for some reason the enrich processor does not work or is not update with the inventory of hosts, we need this field to exist for our alerting system to work.

The policy for this enrich processor is the following:

{
  "match": {
    "indices": "inventory-crowdstrike",
    "match_field": "host.name",
    "enrich_fields": ["host.type"]
  }
}

So, the documents in this .enrich-* index will be something like this:

      {
        "_index" : ".enrich-crowdstrike-host-type-1643385722111",
        "_type" : "_doc",
        "_id" : "BMhsoX4BuJmOVklEI0aF",
        "_score" : 9.393939,
        "_source" : {
          "host" : {
            "name" : "REDACTED-HOSTNAME",
            "type" : "WORKSTATION"
          }
        }
      }

The issue is that sometimes it works, sometimes it does not work, for example, filtering for a host.name that exists in the .enrich index, I've got this in Kibana.

Any Idea on how to troubleshoot this? It seems that there is no failure in the pipeline, as the error.message field is not created.

I'm on 7.16.3.

Just a couple thights....

Case sensitivity, leading/ trailing spaces?

Of course if you added new hosts to the enrich index you need to re-execute it?

Also if it is multi-node cluster execute may take a bit longer as the enrich index needs to be distributed to each node.. make sure it is finished.

I have not seen that behavior before but have not run enrich on 7.16.3 yet

Yeah, I thought that leading/trailing spaces could be the issue, but it doesn't seem to be, I've checked the json document and the field has exactly the same value, but not all of them where enriched.

I didn't notice this behavior when we were using 7.12.1, so I've opened a support ticket as this impacts our alerting systems.

Will update this posts when I have more information to help if someone face the same issue.

2 Likes

If you just run the enrich processor / pipeline by hand in Dev Tools on a document with the host.name that did not work ... does work?

If you re-index with enrich do you get the same missing enriches or random?

Ok, I found something strange.

If I simulate the pipeline with a hostname that I'm sure it doesn' t exist in the enrich index, this is my output.

request

POST _ingest/pipeline/crowdstrike-final-pipeline/_simulate
{
  "docs": [
    {
      "_index": "index",
      "_id": "id",
      "_source": {
        "host": {
          "name": "NON_EXISTENT"
        }
      }
    }
  ]
}

response

{
  "docs" : [
    {
      "doc" : {
        "_index" : "index",
        "_type" : "_doc",
        "_id" : "id",
        "_source" : {
          "host" : {
            "name" : "NON_EXISTENT",
            "type" : "UNKNOWN"
          }
        },
        "_ingest" : {
          "timestamp" : "2022-01-30T00:35:38.308781141Z"
        }
      }
    }
  ]
}

Now, If I run the same simulate request using a host that I know that exists in the enrich index and it is working ok, this is my output.

{
  "docs" : [
    {
      "doc" : {
        "_index" : "index",
        "_type" : "_doc",
        "_id" : "id",
        "_source" : {
          "host" : {
            "name" : "REDACTED-WORKING",
            "type" : "DOMAIN CONTROLLER"
          },
          "enrich" : {
            "host" : {
              "name" : "REDACTED-WORKING",
              "type" : "DOMAIN CONTROLLER"
            }
          }
        },
        "_ingest" : {
          "timestamp" : "2022-01-30T00:39:52.541646675Z"
        }
      }
    }
  ]
}

As you can see, the fields enrich.host.name and enrich.host.type were created and the value of enrich.host.type was copied into host.type.

Now this is the strange part, If I run the same request using a host that exists in the enrich index, but for some reason is not working, this is my output.

{
  "docs" : [
    {
      "doc" : {
        "_index" : "index",
        "_type" : "_doc",
        "_id" : "id",
        "_source" : {
          "host" : {
            "name" : "REDACTED-NOT-WORKING",
            "type" : "UNKNOWN"
          },
          "enrich" : {
            "host" : { }
          }
        },
        "_ingest" : {
          "timestamp" : "2022-01-30T00:42:32.796243381Z"
        }
      }
    }
  ]
}

It created the enrich.host object, but not enrich.host.name or enrich.host.type, which is completely different from the response when the host does not exists in the enrich index.

So, it seems that it is passing through the enrich processor and found the host, but for some reason it is not enriching the document.

I will do more tests on monday to try to track it down.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.