I am using Logstash 7.4 to ingest JSON Lines.
In this particular JSON Lines data, which is from a proprietary source, the event timestamp is the first time value in each incoming line.
By first, I am referring to the serialized JSON Lines input data, which might arrive in a stream over TCP or from a file. I am aware of the following text in the JSON standard (ECMA-404):
The JSON syntax ... does not assign any significance to the ordering of name/value pairs. ... [This] may be defined by JSON processors or in specifications defining specific uses of JSON for data interchange.
The property to use as the event timestamp depends on the
code property, which identifies the incoming event type. For example:
In the first line, the event timestamp is the
start property value, which is the fourth property in the line.
In the second line, the event timestamp is the
collected property, which is the second property in the line.
In each case,
collected are the first properties with a time value.
I could use the
code property value with conditional (
if) statements to identify the name of the JSON property to use as the event timestamp: if
code is this value, then map the timestamp to the value of this property. However, I'd prefer to avoid such conditional statements. I'd like to avoid any
I've already successfully configured a different analytics platform (rhymes with punk ) to ingest this data without any
code-specific config. This is because that platform uses a text/regex-based method to extract timestamps, before it "knows" about the data being JSON Lines.
I'd appreciate suggestions for a Logstash config that:
- Identifies the first value in an incoming line of data that matches an "ISO8601" pattern, and uses the
datefilter to set the event timestamp and (possibly, and then)
- Uses the
json_linescodec for the data
It's occurred to me to customize the
json_lines codec to output a
message field consisting of the entire (unparsed) input line. Then I could extract (e.g.
grok) the timestamp from that message text. Then remove the
message field before output. However, I'd prefer to find a solution that does not involve a custom codec, while still being reasonably performant.
Thoughts, suggestions welcome.
P.S. I've added a related issue (feature request) in GitHub for
json_lines: "Add option to output unparsed line in message field, in addition to parsing".