Centralizing Business Events : RFC for log4j2 appenders for Elasticsearch

(Jean Francois Contour) #1

Centralizing Business events is a very common need that could be handled much more easily and this is the subject of this "Request For Comments". I tried to make it as short as possible. :wink:

As opposed to a technical (think log4j) event, a business event :

  • has a potentially complexe structure
  • should never be lost (think audit purposes)
  • could trigger more use cases if stored asap (Watcher)

There are already a few open source projects that have implemented a log4j appender for Elasticsearch :

and even log4js (javascript) :

Trouble is : they don't comply with the above prerequisites. Further more it would be very useful :

  • to have appenders that rely on the Elasticsearch Rest API (vs java API which is based on Elasticsearch Transport Client Node) to minimise coupling between clients and Elasticsearch (including Java !) versions.
  • to complement Elasticsearch's resiliency (https://www.elastic.co/guide/en/elasticsearch/resiliency/current/index.html
    (see OOM resiliency, loss of documents during network partition) and https://aphyr.com/posts/323-call-me-maybe-elasticsearch-1-5-0
  • to have minimum impact on user (client thread) response time
  • to safely log events even if Elasticsearch (or the network) is down or even if the client JVM crashes a nano second after client application received ok from logging service)

I eventually came up with this idea that 2 Elasticsearch log4j2 appenders were needed :

  1. a "safe" appender that would garanty logging by writing the event on disk (potential java.io exception handled by the client application, possibly used to rollback previous linked updates). An asynchrone (quartz...) process would then periodically try to index the event in Elasticsearch
  2. a "fast" appender that would directly index (Rest API as well) the event in Elasticsearch (asap indexing)

It would then be easy to develop a Business Event Log API (potentialy embedded in some java framework) that would do the following :

  • json-ize the input Java bean (representing the Business Event)
  • potentially complement the json with some other information (node name, method name, ...)
  • generate a UUID (friendly to Elasticsearch like UUIDv1 Base64 url encoded)
  • synchronously logs json (on disk) using safe appender (file name = uuid.json) then asynchronously Put (http) using UUID (and remove from file system if indexed ok (or already indexed))
  • asynchronously (new java thread) logs json using fast appender (http Put using UUID)

A double http put (with a minimum time interval in between) should garanty that even under dramatic circumstances no event is lost. This would avoid inserting events in a "safer" database before indexing them into ES.

The second http put should require a very small amount of Elasticsearch resource (no json parsing nor Lucene indexing)

UUID has to be passed to both appenders (for instance : name of the uuid field in json structure)

The asynchronous part of the "Safe" appender has to make decision upon receiving errors like MapperParsingException. This kind of miss-mapped event_type won't be indexed unless there is a mapping change in the regular index. A possible solution should be to index them in a ES alternate index (kind of dead letter queues) : ESIndexName_errors. Indexing in there will create new mappings (only string types) and I presume they can't be rejected this time.

Consolidation d'événements Métier - Besoin d'appenders Elasticsearch pour Log4J2
(Magnus Bäck) #2

Why the complexity of two different kinds of appenders? Why not always log to disk, possibly using https://github.com/logstash/log4j-jsonevent-layout, and use Logstash to asynchronously ship the events to Elasticsearch?

(Jean Francois Contour) #3

Thanks for this quick reply
I see a few pb with this solution:

  • How does Logstash react upon Mapping errors (for instance) ?
  • I have used Logstash (file input) and got many problems (loops) when Elasticsearch was down or was getting down
  • A single insert into ES does not resolve the consistency issue (events got lost) inside Elasticsearch (see Aphyr post)

(Jörg Prante) #4

You can combine log4j2 appenders, one writes to file, and another one to Elasticsearch, so there is no need to combine file/network-based appending in one project.

Please note also my HTTP-based appender at https://github.com/jprante/log4j2-elasticsearch-http

The JSON object that I generate for logging can be modified/extended for sending "business events", an approach which I applied in some of my professional applications.

(Jean Francois Contour) #5

thanks for answering and glad to hear about the log4j2 to Elasticsearch
http version.

In this http version, where are stored the events before being sent to ES ?
(disk or memory) ?.
And in the asynchronous bulk indexing phase, how are (mapping) exceptions
delt with ?

I'm not sure I fully understood your first sentence :

  • If I use à log4j file appender, I will need to put in place an asynchrone
    process to read them (from file system) and send them to ES ?
  • If I simply combine appenders (UUID generated by ES) I will re-index all
    events twice (unless there is a pb somewhere) : not very good for resource

(Jörg Prante) #6

You can configure to store the log messages in memory, with the parameter maxActionsPerBulkRequest

Mapping exceptions can not be avoided if you allow arbitrary messages. For logging, fortunately, the messages should be uniform, and should not contain unexpected content. Dynamic mapping should be disabled.

Log4j2 handles message reliability. For this, it provides an async API, and a FailoverAppender, see https://logging.apache.org/log4j/2.x/manual/appenders.html

You can decide if you prefer to connect to just one or more Elasticsearch cluster nodes, or clusters, or to another Log4j2 appender that can store the messages which would get lost otherwise, maybe to a file, or to a queue. This (JSON) file would have to be postprocessed for resubmission. In most cases, you will be able to reset to the timestamp where the outage occurred and logging was disconnected from ES. I admit I have never been motivated to set up such a scenario because my Elasicsearch clusters for logging run stable.

(Ralph Goers) #7

I built a system like this for my former employer and is actually one of the reasons I started the work on Log4j 2. The system we used was based on events that were defined in a "catalog". From the catalog we generated interfaces that could be used to create logging events. These events were based on the RFC5424Message and were sent to the EventLogger. All events sent to the EventLogger were then passed to the FlumeAppender, which forwarded them on to one or more Flume remote agents. In those agents we then wrote the events to Cassandra and Solar (I believe they have since switched to using HBase). The point is, by routing all events from the various servers into Flume they can all then be easily routed to whatever the ultimate destination is, without having to reconfigure all the various applications.

That said, there is nothing wrong with using an ElasticSearchAppender if it is appropriate for your circumstances. I am sure we would be happy to add it to our group of NoSQL appenders.

Also, it is my hope to create the auditing/event catalog framework I described as a new Log4j subproject.

(Jörg Prante) #8

Ralph, please feel free to copy/fork the existing code on github of my Log4j2 Elasticsearch appender (both native and HTTP)

I selected Apache 2 license so it should be no problem to incorporate the code (modified or unmodified as you wish) into your group of NoSQL appenders. Alternatively, you can set a link to my projects on your documentation site.

Best, Jörg

(Jean Francois Contour) #9

I have in mind a logging system that would leverage the "almost" schemaless aspect of Elasticsearch : dynamic mapping being extensively used, to the extent of range data types (date and numeric) that should be mapped.

I'll give a trial to your Elasticsearch Appender anyway at least at a starting point.

(Jean Francois Contour) #10

Thanks for answering.
What kind of "catalogue" did you have to implement ?
I looked at Flume too (no tests so far) but I still hope it's possible to have a simple architecture (like direct logging to ES or other data store with log4j2 appenders)

(Ralph Goers) #11

The catalog contained the definitions for the events; the event "name" or type, as well as all the attributes, their datatypes and some basic validation rules (min/max), enumerated list, etc. As the events are created the validation rules help to minimize the amount of bad data being added. The programmer would essentially do something like: catalog.getEvent("Login").setLoginId(loginId).logEvent();
TransferEvent event = catalog.getEvent("transfer").setToAccount(toaccount).setFromAccount(fromAccount).setAmount(amount);

(Jean Francois Contour) #12

It seems to me that this system could be used to replace a Single Source Of Tuth Datastore.
In the kind of use case I have in mind, the catalogue would be more simple (list of event types : code, description, category) and mainly needed for grouping/sorting (user interface) and authorisation purposes (could be ES filters).It could even be populated "a posteriori" : if an event type is missing in the catalogue, the associated events are simply "hidden" from the searches based on the catalogue.

(system) #13