I am using Logstash 7.4 to ingest JSON Lines.
In this particular JSON Lines data, which is from a proprietary source, the event timestamp is the first time value in each incoming line.
By first, I am referring to the serialized JSON Lines input data, which might arrive in a stream over TCP or from a file. I am aware of the following text in the JSON standard (ECMA-404):
The JSON syntax ... does not assign any significance to the ordering of name/value pairs. ... [This] may be defined by JSON processors or in specifications defining specific uses of JSON for data interchange.
The property to use as the event timestamp depends on the code
property, which identifies the incoming event type. For example:
{"code":"abc-123","system":"mysys","tranid":"xyz","start":"2019-10-22T13:00:00.01Z","cpu":0.05,"stop":"2019-10-22T13:00:00.02Z"}
{"code":"def-456","collected":"2019-10-22T13:15:00Z","errors":321,"#tran":54321}
In the first line, the event timestamp is the start
property value, which is the fourth property in the line.
In the second line, the event timestamp is the collected
property, which is the second property in the line.
In each case, start
and collected
are the first properties with a time value.
I could use the code
property value with conditional (if
) statements to identify the name of the JSON property to use as the event timestamp: if code
is this value, then map the timestamp to the value of this property. However, I'd prefer to avoid such conditional statements. I'd like to avoid any code
-specific config.
I've already successfully configured a different analytics platform (rhymes with punk ) to ingest this data without any code
-specific config. This is because that platform uses a text/regex-based method to extract timestamps, before it "knows" about the data being JSON Lines.
I'd appreciate suggestions for a Logstash config that:
- Identifies the first value in an incoming line of data that matches an "ISO8601" pattern, and uses the
date
filter to set the event timestamp and (possibly, and then) - Uses the
json_lines
codec for the data
It's occurred to me to customize the json_lines
codec to output a message
field consisting of the entire (unparsed) input line. Then I could extract (e.g. grok
) the timestamp from that message text. Then remove the message
field before output. However, I'd prefer to find a solution that does not involve a custom codec, while still being reasonably performant.
Thoughts, suggestions welcome.
P.S. I've added a related issue (feature request) in GitHub for json_lines
: "Add option to output unparsed line in message field, in addition to parsing".