I'm trying to create a solution to analyze logs for an application that runs tasks specified by the user, that can each take up to a few hours to complete, and can be ran in parallel. I would like to be able to report things like average task length, and common task errors. I would also like to be able to send mail whenever certain problematic error messages appear .
A log file would for example look like
timestamp task1 begin
timestamp task1 fetching data
timestamp task1 compiling data
timestamp task2 begin
timestamp task2 warning: ....
timestamp task1 task finished
--End of log file--
It would also be interesting to be able to index errors, such that the user is able to look at how many times in a given timeframe, "critical error 2" has ocurred, in which servers, when it's last occurence was, which tasks were being run.. and other information that could help them monitor the system.
I've researched a bit on this topic, and understand that it's best to either re-index the log data with a transform job in ES in order to join all the info about a task in a single "task" document or "error" document, or somehow index the log as task entities or error entities from the get-go. (entity-centric data vs event-centric data)
I was wondering if, for my use case, a good way to go about solving this is to have a redis db that indexes the log as they come, and then have a script constantly running that is waiting to find a "task finished" log, and once it does it fetches the entire task from the DB, creates a json and puts it into elasticsearch.
Is this a reasonable idea? How does it compare to transform jobs?