My_weblog_pipline Grok Error in Elastic Observability Engineer On Demand Course

Course: Elastic Observability Engineer
Version: On-Demand
Question:

I’m currently working through the labs in the Elastic Observability Engineer (On-Demand) course. I completed Labs 5.1 through 5.4, but ran into an error.

When I first created “my_weblog_pipeline”, it was working fine. After 5.3, though, the Grok line started erroring out. I continued on in the hopes that some further step would fix this, but it didn’t. I’m now done with 5.4, but the Grok step is still giving an error:

{
  "docs": [
    {
      "error": {
        "root_cause": [
          {
            "type": "illegal_argument_exception",
            "reason": "field [original] not present as part of path [event.original]"
          }
        ],
        "type": "illegal_argument_exception",
        "reason": "field [original] not present as part of path [event.original]"
      }
    }
  ]
}

When I compare what I have for the my_weblog_pipeline with what the course says I should have, they are exactly the same. Only the course is insisting that it would work fine and mine is erroring out. Assuming I did something wrong, I cloned the pipeline, deleted it, and created it from scratch – following all of the directions again. This time, it gave the error right from the start. What can I do to get this working properly again?

Here is the my_weblog_pipeline that I have:

[
  {
    "grok": {
      "field": "event.original",
      "patterns": [
        "%{IP:ip_address} %{USER:username} %{USER:user} \\[%{HTTPDATE:@timestamp}\\] \"%{DATA:request}\" %{NUMBER:response_code} %{NUMBER:response_size} \"%{DATA:referer}\" %{QS:useragent}"
      ]
    }
  },
  {
    "remove": {
      "field": [
        "event.original",
        "agent",
        "cloud",
        "data_stream",
        "ecs",
        "elastic_agent",
        "event",
        "host",
        "http",
        "input",
        "log",
        "nginx",
        "related",
        "source",
        "tags",
        "url",
        "user_agent"
      ]
    }
  },
  {
    "drop": {
      "if": "ctx.response_code == \"200\""
    }
  },
  {
    "convert": {
      "field": "response_code",
      "type": "integer"
    }
  },
  {
    "convert": {
      "field": "response_size",
      "type": "long"
    }
  },
  {
    "date": {
      "field": "@timestamp",
      "formats": [
        "dd/MMM/yyyy:HH:mm:ss Z"
      ]
    }
  },
  {
    "user_agent": {
      "field": "useragent"
    }
  },
  {
    "geoip": {
      "field": "ip_address"
    }
  },
  {
    "remove": {
      "field": "useragent"
    }
  },
  {
    "set": {
      "field": "geoip.city_name",
      "value": "Chicago",
      "if": "ctx.geoip.country_iso_code != \"US\""
    }
  },
  {
    "enrich": {
      "field": "geoip.city_name",
      "policy_name": "add_zipcode_policy",
      "target_field": "geoip.enriched"
    }
  }
]

Hello @jasonlevine

Welcome to the Community!!

I see the pipeline you have shared is as per the final pipeline in 5.4 , now when you test a record from the dataview : logs-* and in this log.file.path : “var/log/nginx/access.log” with 200 status code as per the pipeline this should get dropped & if it is not 200 , for such record what is the output/error you are receiving?

As per the error shared it says “original” field is not part of the path event.original…can you please check one record in discover and see if this field actually exists? if not i believe it might be “message” field & instead of event.original you might have to use message if that is the case as your first processor GROK needs field event.original which is not available in the index :

 {
    "grok": {
      "field": "event.original",
      "patterns": [
        "%{IP:ip_address} %{USER:username} %{USER:user} \\[%{HTTPDATE:@timestamp}\\] \"%{DATA:request}\" %{NUMBER:response_code} %{NUMBER:response_size} \"%{DATA:referer}\" %{QS:useragent}"
      ]
    }
  }

Thanks!!