I know ELK started as a way to make sense of log statements, like: "We shipped 12 yellow rubber duckies to France".
It will pick out "yellow", "rubber" and "duckies", and try to correlate them with related statements.
This is awesome. However, I started passing key/value pairs instead of log statements. So instead of the above, I send something like:
Therefore, there's no more analysis needed. There's nothing for Lucene to do.
So my deep question is, does it still make sense to use a natural language processor like Lucene? And by extension, is ELK the best solution for analyzing data already described by discrete keys?
Bonus question: Is my approach common? I don't understand why everybody who can doesn't convert their old logging statements into JSON directly, in the code.
I think you are already on the right track: While it sometimes doesn't make sense to modify legacy applications to write JSON logs (or it is outright impossible for some bought software) the best way is to directly change the application to write structured logs. Some of the benefits are:
improved speed as the parsing on Elastic side does not need complex regular expressions in GROK
stability as the ingest patterns do not need to be updated when the log format changes
So, coming to your questions:
While you might not use the natural language features of Lucene anymore, Lucene still helps you making sense of your data with its indexing, search, highlighting and more. And on top of that Elastic provides you observability, search and many other features that makes it very useful (although I do not say "the best" as this might be biased - everyone needs to use a tool they are comfortable with).
I guess your approach is very common: we already ask all our applications using the Stack to write JSON logs if possible and adhere to the ECS schema to minimize ingest logic on the Elastic side. For us, this means that applications running in Openshift write their JSON logs to standard out, FluentD will forward the logs to Elasticsearch and the Ingest pipeline will just parse the message as JSON.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.