Dear friends,
I'm new to ElasticSearch, only been enjoying it as the backend of the Kibana instance running in production.
I have a new requirement, I have to parse application logs in real time, until now via bufferedReader.readLine() in a loop on the log files themselves.
In the new scenario the applications are containers in Openshift that are forwarding logs to Fluentd and Fluentd is sending them to ES, so I'd like to just swap out the bufferedReader.readLine() of my previous parsing application in favour of an API call exposed by the ES Java SDK.
Is there something similar that would fit my requirement?
Thanks for the help Alexander. What I have is a Java application that "tail -f" application logs by calling BufferedReader.readLine() in a loop. For each read line it does some stuff, for example if the application log line is of a certain type, it reads it and uses it to perform actions on a Remedy trouble ticketing system via its Java API. This works fine when the application logs are written to disk and the standalone Java app can simply open a Java stream to read them.
Now a client has a microservice application deployed in Openshift, the containers log to stdout, the logs are already being collected via Fluend and Fluentd forwards them to an ES instance, also managed directly by the client.
In order to deploy my Java app I need to be able to "tail -f" those logs. I have two options (at least I think): the first is to ask the client to update the Fluentd configuration to add a second target (file_out) to collect copies of application log files over a Volume and start a containerized version of the Java app that reads those logs from the Volume, the second is to access those logs directly from Elasticsearch via its Java API, in which case the only change in the existing Java app would be to replace the BufferedReader readLine() in a loop with something else and keep everything else as-is.
ah, now things become a bit more clear. You could execute a search against the elasticsearch index and use that one instead of your tail -f like java logic. However as you never know how long your data took from your app into elasticsearch, I think it might make sense to have some duplicate detection in there.
just to be sure in terms of understanding: a scroll search is a point in time snapshot, so it is not updating continously, you would need to execute a new one after processing this.
Ah ok I thought scroll implied something that was "scrolling" through data continously, but if it isn't it's not scrolling that I should be looking into!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.