I've got several lines of logs forwarded to Elasticsearch. One line
of log corresponds to one ES document. Each line is identified by a
connection session id, and a connection session is described by several
lines of logs.
E.g. assume a connection session is identified by the following id: f788i7b0. The same connection session is described by the following logs:
Feb 16 17:15:07 slot1/APM notice tmm1[11112]: 01490500:5: f788i7b0: New session from client IP 8.8.8.8 (ST=/CC=/C=) at VIP 9.9.9.9 Listener /Common/vs_stack.com (Reputation=Unknown)
Feb 16 17:15:07 slot1/APM notice apd[5515]: 01490010:5: f788i7b0: Username 'stack.com\lincohn'
Feb 16 17:15:07 slot1/APM notice apd[5515]: 01490005:5: f788i7b0: Following rule 'fallback' from item 'SSO Credential Mapping' to ending 'Allow'
Here is the query I use to filter out sessions based on their VIP IP (e.g. here 9.9.9.9):
But so far, I am unable to fetch the session id, which is supposed to
be in the same document as the VIP IP, and then use this same session
id to go fetch the username 2 log lines further, hence, in another
document if you use ES language.
Isn't the sky the limit as to what kind of data one can extract from ES clusters? Isn't it possible to store the value of session_id like one would do in a normal script, and then use it further to parse once again all ES documents containing these session IDs looking for corresponding user IDs ? The whole thing only with DSL Queries and aggs?
I maybe found an alternative to this. By using grok filter, there is a way to consolidate multiple lines into a single event, and thus, have all lines related to a particular session id connected:
That would make the connected lines belong to the same ES document, and therefore make the search with DSL Queries possible. That would be easy to implement if the indicator that proves an event field is part of a multi-line event was known in advance. But what if the indicator is a random session id that cannot be guessed? and the different log events interwined and not consecutive? I don't know yet if it is feasible.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.