Map timestamp using first time value in incoming JSON Lines?

I am using Logstash 7.4 to ingest JSON Lines.

In this particular JSON Lines data, which is from a proprietary source, the event timestamp is the first time value in each incoming line.

By first, I am referring to the serialized JSON Lines input data, which might arrive in a stream over TCP or from a file. I am aware of the following text in the JSON standard (ECMA-404):

The JSON syntax ... does not assign any significance to the ordering of name/value pairs. ... [This] may be defined by JSON processors or in specifications defining specific uses of JSON for data interchange.

The property to use as the event timestamp depends on the code property, which identifies the incoming event type. For example:

{"code":"abc-123","system":"mysys","tranid":"xyz","start":"2019-10-22T13:00:00.01Z","cpu":0.05,"stop":"2019-10-22T13:00:00.02Z"}
{"code":"def-456","collected":"2019-10-22T13:15:00Z","errors":321,"#tran":54321}

In the first line, the event timestamp is the start property value, which is the fourth property in the line.

In the second line, the event timestamp is the collected property, which is the second property in the line.

In each case, start and collected are the first properties with a time value.

I could use the code property value with conditional (if) statements to identify the name of the JSON property to use as the event timestamp: if code is this value, then map the timestamp to the value of this property. However, I'd prefer to avoid such conditional statements. I'd like to avoid any code-specific config.

I've already successfully configured a different analytics platform (rhymes with punk :wink: ) to ingest this data without any code-specific config. This is because that platform uses a text/regex-based method to extract timestamps, before it "knows" about the data being JSON Lines.

I'd appreciate suggestions for a Logstash config that:

  1. Identifies the first value in an incoming line of data that matches an "ISO8601" pattern, and uses the date filter to set the event timestamp and (possibly, and then)
  2. Uses the json_lines codec for the data

It's occurred to me to customize the json_lines codec to output a message field consisting of the entire (unparsed) input line. Then I could extract (e.g. grok) the timestamp from that message text. Then remove the message field before output. However, I'd prefer to find a solution that does not involve a custom codec, while still being reasonably performant.

Thoughts, suggestions welcome.

P.S. I've added a related issue (feature request) in GitHub for json_lines: "Add option to output unparsed line in message field, in addition to parsing".

You are going to have to use a json filter rather than a codec, since, as you know, you do not get the unparsed message with the codec. Then you can use

    grok { match => [ "message", "(?<[@metadata][timestamp]>[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}(\.[0-9]{2})?Z)" ] }
    date { match => [ "[@metadata][timestamp]", "YYYY-MM-dd'T'HH:mm:ss'Z'", "YYYY-MM-dd'T'HH:mm:ss.SS'Z'" ] }

A local colleague supplied me with the following working config:

input {
  tcp {
    port => 6789
    codec => line
  }
}
filter {
  grok {
    match => { "message" => "%{TIMESTAMP_ISO8601:_time}" }
  }
  date {
    match => [ _time, ISO8601 ]
  }
  json {
    source => "message"
    remove_field => [ _time, message ]
  }
}
...

Kudos for using the lines codec. I wish I'd thought of that. Also for citing the supplied TIMESTAMP_ISO8601 Grok pattern, which is a perfect fit for this use case. Sweet!

Thank you very much for your answer. Yes, you're absolutely correct on both counts.

A colleague has just supplied me with a complete working config that I've copied here and ticked as a solution. I'm going to see if I can also tick yours (accept more than one solution).

Argh. It seems as if I can't select two solutions :frowning_face:. This puts me in a difficult situation.

With apologies, I'm going to select my local colleague's solution, because it's a complete config that specifies lines as the input plugin, and cites that nice prebuilt Grok pattern.

Thanks again for your (valid) advice, much appreciated.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.