ECS and Python logging

Hi,

I'm working on an update to a set of interlinked internal projects to adopt the ECS for easier correlation and tracking of pipelines.

One of the apps uses python logging, so i'm trying to work on renaming the standard fields (msg to message, levelname to log.level etc). Is there a standard pattern for enabling this?

The logs aren't in elastic already, this is an onboarding exercise so we can rename as we like.

Is this the correct way of addressing this or is there a better way?

Thanks in advance

Hi Darren, it sounds like you are on the right track. All transformations to ECS start with performing a logical mapping of your fields to ECS fields. A short and sweet set of guidelines can be found here: https://www.elastic.co/guide/en/ecs/current/ecs-converting.html.

Hope this helps.

Thank you,

So i've wrangled python logging to output something like this which looks like it meets the ECS standards:

{
"log.origin.file.name":"program.py",
"@timestamp":"2019-10-08T11:11:23+0100",
"@version":"1",
"transaction.id":"not_available",
"client.ip":"unavailable",
"message":"test message"
}

But i get the following error (most of the fields give similar messages) when using filebeat to foward the entries to elasticsearch:

{"type":"mapper_parsing_exception","reason":"failed to parse field [client.ip] of type [ip] in document with id 'jgTkq20BKXzrRSyIF85V'. Preview of field's value: 'unavailable'","caused_by":{"type":"illegal_argument_exception","reason":"'unavailable' is not an IP string literal."}

Any idea whats going on here?

my input looks like this:

- type: log                                                                                                                                                                                                                                                                                                                                                               
  enabled: true                                                                                                                                                                       
  paths:                                                                                                                                                                                
    - /local/program_logs/client-*.ecs                                                                                                                                                                                                                                                                                                                              
  json.keys_under_root: true                                                                                                                                                          
  json.keys_under_root: true                                                                                                                                                          
  json.add_error_key: true                                                                                                                                                                                                                                                                                                                                                
  processors:                                                                                                                                                                           
    - dissect:
        tokenizer: '/local/program_logs/client-%{service.name}.ecs'
        field: log.file.path

And the ES output is pretty plain hostname with username/password combo

Oh, just found this in the ECS docs:

  • The document structure should be nested JSON objects. If you use Beats or Logstash, the nesting of JSON objects is done for you automatically. If you’re ingesting to Elasticsearch using the API, your fields must be nested objects, not strings containing dots.

So i am forced to provide the JSON in a nested fashion? i'm not sure thats possible in python logging. Any ideas on what i can do?

Yes, you got it, the dots represent nesting :+1:

Note that just like when querying via Kibana, perhaps the Python library translates dots to nesting, I'm not sure.

What leads me to think so is that you're getting a legit error on client.ip. The datatype of this field is ip, the text "unavailable" is an invalid IP :slight_smile:

When you don't have a value, it's ok leave the field empty. You can later query for documents that are missing a value.

Second note on client.ip (or many of the other .ip fields).

In order to support event sources that alternate between IP addresses, hostnames or even unix sockets, when referring to an address, we've added a .address field which can take free form text.

If your event source provides such ambiguous values, the recommendation is to store the value in .address, and only when it's an IP copy it out to .ip.

If your source only provides IP addresses, it's fine to directly store them in .ip.

The python library is giving output exactly as i pasted. is there any way to get filebeat to convert the dot notation into fully nested JSON? it doesn't looks like its easily possible to get python to nest it for me without considerable development.

I'll have a look at removing the entries that are "unavailable", it makes sense to exclude those fields when not needed instead.

There's no direct way to automatically replace arbitrary dots in Beats, as far as I can tell. If you're using Logstash, the "de_dot" plugin will do it.

But in Beats, you have two options:

  • If you have a well defined set of fields that get created with dots, perhaps you can use the Beats rename processor to rename each field. I'm not sure if the processor will be able to read the dotted keys. However the output with dotted keys will translate to nesting.
  • If this doesn't work, or if you have too many different fields depending on the event type, you can do it programmatically with the Beats script processor.

But ultimately the goal is not to have to do that, so changing the application itself to produce nested keys will be optimal for you, as there will be no translation step needed in the pipeline anymore.

The rename function seems to work, although i have to change the name to avoid duplicate keys.

I'm trying to sort out the event.created field now. i get the following erroR:

{"type":"mapper_parsing_exception","reason":"failed to parse field [event.created] of type [date] in document with id 'SApSsG0BKXzrRSyIEpSm'. Preview of field's value: '1.5706212264142175E9'","caused_by":{"type":"illegal_argument_exception","reason":"failed to parse date field [1.5706212264142175e+09] with format [strict_date_optional_time||epoch_millis]","caused_by":{"type":"date_time_parse_exception","reason":"date_time_parse_exception: Failed to parse with all enclosed parsers"}}

when trying use the following log line with "created" being the field i'm renaming to event.created. It looks like it is in correct epoch format though.

{"created": 1570621231.4085894, "clientip": null, "clientid": null, "transactionid": "not_available", "levelno": 20, "}

So it seems the log file is formatted correctly but filebeat isn't taking it in the same format.

If you were fully in control of the index template or mapping, you could address this by playing with the Elasticsearch attribute for date fields, and including the epoch format in the accepted formats.

However that'll be a painful route to go down, since you'd have to adjust your Beats template every time your run filebeat setup.

A better approach here is to directly output this as an ISO 8601 date if that's an option.

If that's a no go, you can convert the epoch timestamp directly in Beats with the timestamp processor, or in an Elasticsearch ingest pipeline, with the date processor.