Unable to parse "message"

Hello.
New user of logstash here so please bear with me!
Sending over TCP from python using logstash_async, I receive the item in logstash (see bottom of message). How can I parse the "message" into fields?
I have tried to use the below filter but then the messages stop parsing:

filter {
   json {
      source => "message"
   }
}

Thank you.

{
  "_index": ".ds-logs-generic-default-2023.05.15-000001",
  "_id": "ID7EKIgBdldke0oWVk1p",
  "_version": 1,
  "_score": 0,
  "_source": {
    "@version": "1",
    "@timestamp": "2023-05-17T08:12:15.741219356Z",
    "event": {
      "original": "{\"@timestamp\": \"2023-05-17T08:12:15.308Z\", \"@version\": \"1\", \"host\": \"Razor17\", \"level\": \"INFO\", \"logsource\": \"Razor17\", \"message\": \"test sending to logstash remotely\", \"pid\": 51968, \"program\": \"C:\\\\Users\\\\benoi\\\\PycharmProjects\\\\pythonProject\\\\ExcelSheets\\\\snippets.py\", \"type\": \"python-logstash\", \"extra\": {\"func_name\": \"<module>\", \"interpreter\": \"c:\\\\Users\\\\benoi\\\\PycharmProjects\\\\venv310\\\\Scripts\\\\python.exe\", \"interpreter_version\": \"3.10.2\", \"line\": 2931, \"logger_name\": \"__main__\", \"logstash_async_version\": \"2.5.0\", \"path\": \"C:\\\\Users\\\\benoi\\\\PycharmProjects\\\\pythonProject\\\\ExcelSheets\\\\snippets.py\", \"process_name\": \"MainProcess\", \"thread_name\": \"MainThread\"}}"
    },
    "message": "{\"@timestamp\": \"2023-05-17T08:12:15.308Z\", \"@version\": \"1\", \"host\": \"Razor17\", \"level\": \"INFO\", \"logsource\": \"Razor17\", \"message\": \"test sending to logstash remotely\", \"pid\": 51968, \"program\": \"C:\\\\Users\\\\benoi\\\\PycharmProjects\\\\pythonProject\\\\ExcelSheets\\\\snippets.py\", \"type\": \"python-logstash\", \"extra\": {\"func_name\": \"<module>\", \"interpreter\": \"c:\\\\Users\\\\benoi\\\\PycharmProjects\\\\venv310\\\\Scripts\\\\python.exe\", \"interpreter_version\": \"3.10.2\", \"line\": 2931, \"logger_name\": \"__main__\", \"logstash_async_version\": \"2.5.0\", \"path\": \"C:\\\\Users\\\\benoi\\\\PycharmProjects\\\\pythonProject\\\\ExcelSheets\\\\snippets.py\", \"process_name\": \"MainProcess\", \"thread_name\": \"MainThread\"}}",
    "type": "python-tcp",
    "data_stream": {
      "type": "logs",
      "dataset": "generic",
      "namespace": "default"
    }
  },
  "fields": {
    "@timestamp": [
      "2023-05-17T08:12:15.741Z"
    ],
    "event.original": [
      "{\"@timestamp\": \"2023-05-17T08:12:15.308Z\", \"@version\": \"1\", \"host\": \"Razor17\", \"level\": \"INFO\", \"logsource\": \"Razor17\", \"message\": \"test sending to logstash remotely\", \"pid\": 51968, \"program\": \"C:\\\\Users\\\\benoi\\\\PycharmProjects\\\\pythonProject\\\\ExcelSheets\\\\snippets.py\", \"type\": \"python-logstash\", \"extra\": {\"func_name\": \"<module>\", \"interpreter\": \"c:\\\\Users\\\\benoi\\\\PycharmProjects\\\\venv310\\\\Scripts\\\\python.exe\", \"interpreter_version\": \"3.10.2\", \"line\": 2931, \"logger_name\": \"__main__\", \"logstash_async_version\": \"2.5.0\", \"path\": \"C:\\\\Users\\\\benoi\\\\PycharmProjects\\\\pythonProject\\\\ExcelSheets\\\\snippets.py\", \"process_name\": \"MainProcess\", \"thread_name\": \"MainThread\"}}"
    ],
    "data_stream.namespace": [
      "default"
    ],
    "data_stream.dataset": [
      "generic"
    ],
    "@version": [
      "1"
    ],
    "type": [
      "python-tcp"
    ],
    "message": [
      "{\"@timestamp\": \"2023-05-17T08:12:15.308Z\", \"@version\": \"1\", \"host\": \"Razor17\", \"level\": \"INFO\", \"logsource\": \"Razor17\", \"message\": \"test sending to logstash remotely\", \"pid\": 51968, \"program\": \"C:\\\\Users\\\\benoi\\\\PycharmProjects\\\\pythonProject\\\\ExcelSheets\\\\snippets.py\", \"type\": \"python-logstash\", \"extra\": {\"func_name\": \"<module>\", \"interpreter\": \"c:\\\\Users\\\\benoi\\\\PycharmProjects\\\\venv310\\\\Scripts\\\\python.exe\", \"interpreter_version\": \"3.10.2\", \"line\": 2931, \"logger_name\": \"__main__\", \"logstash_async_version\": \"2.5.0\", \"path\": \"C:\\\\Users\\\\benoi\\\\PycharmProjects\\\\pythonProject\\\\ExcelSheets\\\\snippets.py\", \"process_name\": \"MainProcess\", \"thread_name\": \"MainThread\"}}"
    ],
    "data_stream.type": [
      "logs"
    ]
  }
}

Hi @bsauvage1, Your message field seems to contain escaped JSON. In this case, you first need to remove the escape characters before parsing the JSON.

have you tried it? Mutate filter plugin | Logstash Reference [8.7] | Elastic

Thanks @Alex_Salgado-Elastic
I added codec => json to the tcp input, as it's supposed to do just that.
As the message wasn't arriving, I added a debug output stdout {codec => rubydebug} and checked the logs.
The issue now is in the parsing of "process". I checked the ECS definitions and it seems the field is correctly formed so I'm not sure what's going on. I've temporarily removed it with mutate and message is now going in but I'd like to understand why this error happens?

[2023-05-18T00:14:11,299][WARN ][logstash.outputs.elasticsearch][main][6d2fa9664f0cd654c856a248033a8772afbcd07dc5cc5a974c91b14e3aed3707] Could not index event to Elasticsearch. status: 400, action: ["create", {:_id=>nil, :_index=>"logs-generic-default", :routing=>nil}, {"message"=>"test sending to logstash remotely", "ecs"=>{"version"=>"1.6.0"}, "log.level"=>"info", "log"=>{"logger"=>"__main__", "origin"=>{"file"=>{"line"=>2931, "name"=>"snippets.py"}, "function"=>"<module>"}, "original"=>"test sending to logstash remotely"}, "process"=>{"thread"=>{"id"=>64024, "name"=>"MainThread"}, "name"=>"MainProcess", "pid"=>17356}, "@timestamp"=>2023-05-18T00:14:00.701Z, "@version"=>"1", "type"=>"python-tcp", "data_stream"=>{"type"=>"logs", "dataset"=>"generic", "namespace"=>"default"}}], response: {"create"=>{"_index"=>".ds-logs-generic-default-2023.05.15-000001", "_id"=>"Gso1LIgBR4WsEcuaAddT", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse field [process] of type [long] in document with id 'Gso1LIgBR4WsEcuaAddT'. Preview of field's value: '{name=MainProcess, pid=17356, thread={name=MainThread, id=64024}}'", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"Cannot parse object as number"}}}}
{
       "message" => "test sending to logstash remotely",
           "ecs" => {
        "version" => "1.6.0"
    },
     "log.level" => "info",
           "log" => {
          "logger" => "__main__",
          "origin" => {
                "file" => {
                "line" => 2931,
                "name" => "snippets.py"
            },
            "function" => "<module>"
        },
        "original" => "test sending to logstash remotely"
    },
       "process" => {
        "thread" => {
              "id" => 64024,
            "name" => "MainThread"
        },
          "name" => "MainProcess",
           "pid" => 17356
    },
    "@timestamp" => 2023-05-18T00:14:00.701Z,
      "@version" => "1",
          "type" => "python-tcp"
}

Hello. Sorry to ask back on this. Is this is an issue worth pursuing and raising a bug report about, ithe ecs python plugin, or logstash itself ?

Thanks!

It is not clear what is your current issue since you didn't share any log error from Logstash.

You need to share logs from Logstash.

Apologies if it wasn't clear/visible properly, I believe it was the first line in the code section I had sent - please let me know if this is the below you are after:

[2023-05-18T00:14:11,299][WARN ][logstash.outputs.elasticsearch][main][6d2fa9664f0cd654c856a248033a8772afbcd07dc5cc5a974c91b14e3aed3707] Could not index event to Elasticsearch. status: 400, action: ["create", {:_id=>nil, :_index=>"logs-generic-default", :routing=>nil}, {"message"=>"test sending to logstash remotely", "ecs"=>{"version"=>"1.6.0"}, "log.level"=>"info", "log"=>{"logger"=>"main", "origin"=>{"file"=>{"line"=>2931, "name"=>"snippets.py"}, "function"=>""}, "original"=>"test sending to logstash remotely"}, "process"=>{"thread"=>{"id"=>64024, "name"=>"MainThread"}, "name"=>"MainProcess", "pid"=>17356}, "@timestamp"=>2023-05-18T00:14:00.701Z, "@version"=>"1", "type"=>"python-tcp", "data_stream"=>{"type"=>"logs", "dataset"=>"generic", "namespace"=>"default"}}], response: {"create"=>{"_index"=>".ds-logs-generic-default-2023.05.15-000001", "_id"=>"Gso1LIgBR4WsEcuaAddT", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse field [process] of type [long] in document with id 'Gso1LIgBR4WsEcuaAddT'. Preview of field's value: '{name=MainProcess, pid=17356, thread={name=MainThread, id=64024}}'", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"Cannot parse object as number"}}}}

And the json input was:

{
       "message" => "test sending to logstash remotely",
           "ecs" => {
        "version" => "1.6.0"
    },
     "log.level" => "info",
           "log" => {
          "logger" => "__main__",
          "origin" => {
                "file" => {
                "line" => 2931,
                "name" => "snippets.py"
            },
            "function" => "<module>"
        },
        "original" => "test sending to logstash remotely"
    },
       "process" => {
        "thread" => {
              "id" => 64024,
            "name" => "MainThread"
        },
          "name" => "MainProcess",
           "pid" => 17356
    },
    "@timestamp" => 2023-05-18T00:14:00.701Z,
      "@version" => "1",
          "type" => "python-tcp"
}

In your event the [process] field is an object containing [process][thread], [process][name] etc. However, elasticsearch expects it to be of type long. That could be because there is a mapping, either explicitly set, or via a template, or through dynamic mapping of a previous event.

It is not a bug.

I am using the ECS template 1.6.0 (using the ecs-logging python library), which specifies [process] as an object with sub-objects.

I am running stack 8.6.2, and as per the doc the pipeline should use by default ecs implied schema v8. Regardless, I added pipeline.ecs_compatibility: v1 to logstash.yml. Unfortunately it gives the same error as before.

Should I specify the use of the ecs schema differently?

No. I understand that ecs-v1/elasticsearch-8x.json defines [process] as an object containing a bunch of fields (args, arg_count, etc.) but that does not set the mapping in elasticsearch. To send ECS compatible events to elasticsearch you need the index to have an ECS compatible mapping, and yours doesn't.

Use the Mapping API and I am confident that it will show you that.

So the question is, how did the index get that mapping? I don't run elasticsearch so I cannot test it, but my reading of the code of the output plugin is that it will attempt to set an ECS compatible template on the index, which, provided none of the fields already exist, would result in a compatible mapping.

Working back from that it suggests that something has already created the [process] field with a mapping type of long (possibly from an event where [process] is just the PID).

Can you switch to another (non currently existing) index name and see if the problem persists?

When I check my existing index (using the Kibana webUI) I see that indeed it maps [process] as a long.

I tried adding ecs_compatibility => "v1"in main.conf inside output {elasticsearch {...}} but still the same errors. I notice in the Stack Management GUI that indices are till the same.
Is there an easy way to delete and rebuild the index so that it natively accepts ECS fields definitions, without resorting to some heavy mapping code ?

I looked at the page below and tried to create a new template for index pattern logs-*-*, then I filter component templates by ecs and still find quite a few - the only one that seems to have mappings for [process] tree is .alerts-ecs-mappings but I'm really unsure what to do here.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.