Please share your wisdom: Passing Elastic key/value pairs instead of log statements?

Hi all. I'm looking for some very general advice.

I know ELK started as a way to make sense of log statements, like:
"We shipped 12 yellow rubber duckies to France".
It will pick out "yellow", "rubber" and "duckies", and try to correlate them with related statements.

This is awesome. However, I started passing key/value pairs instead of log statements. So instead of the above, I send something like:

{
"PRODUCT": "ducky",
"COLOR": "yellow",
"QUANTITY": "12",
"COUNTRY": "France"
}

Therefore, there's no more analysis needed. There's nothing for Lucene to do.

So my deep question is, does it still make sense to use a natural language processor like Lucene? And by extension, is ELK the best solution for analyzing data already described by discrete keys?

Bonus question: Is my approach common? I don't understand why everybody who can doesn't convert their old logging statements into JSON directly, in the code.

I hope it's OK to ask this here. I love ELK.

Thanks!

Hello Brian,

I think you are already on the right track: While it sometimes doesn't make sense to modify legacy applications to write JSON logs (or it is outright impossible for some bought software) the best way is to directly change the application to write structured logs. Some of the benefits are:

  • improved speed as the parsing on Elastic side does not need complex regular expressions in GROK
  • stability as the ingest patterns do not need to be updated when the log format changes

So, coming to your questions:
While you might not use the natural language features of Lucene anymore, Lucene still helps you making sense of your data with its indexing, search, highlighting and more. And on top of that Elastic provides you observability, search and many other features that makes it very useful (although I do not say "the best" as this might be biased - everyone needs to use a tool they are comfortable with).

I guess your approach is very common: we already ask all our applications using the Stack to write JSON logs if possible and adhere to the ECS schema to minimize ingest logic on the Elastic side. For us, this means that applications running in Openshift write their JSON logs to standard out, FluentD will forward the logs to Elasticsearch and the Ingest pipeline will just parse the message as JSON.

I hope this helps.

Best regards
Wolfram

Thank you so much! This helps a great deal!

:smiley: :slightly_smiling_face: :upside_down_face: :blush:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.