How to fetch a field value in an elasticsearch document and then parse other documents based on this same value

jpry · February 25, 2016, 10:08am

I've got several lines of logs forwarded to Elasticsearch. One line
of log corresponds to one ES document. Each line is identified by a
connection session id, and a connection session is described by several
lines of logs.

E.g. assume a connection session is identified by the following id: f788i7b0. The same connection session is described by the following logs:

Feb 16 17:15:07 slot1/APM notice tmm1[11112]: 01490500:5: f788i7b0: New session from client IP 8.8.8.8 (ST=/CC=/C=) at VIP 9.9.9.9 Listener /Common/vs_stack.com (Reputation=Unknown)

Feb 16 17:15:07 slot1/APM notice apd[5515]: 01490010:5: f788i7b0: Username 'stack.com\lincohn'

Feb 16 17:15:07 slot1/APM notice apd[5515]: 01490005:5: f788i7b0: Following rule 'fallback' from item 'SSO Credential Mapping' to ending 'Allow'

Here is the query I use to filter out sessions based on their VIP IP (e.g. here 9.9.9.9):

{
"fields" : ["session_id"],
"query" : {
"term" : { "vip_ip" : "9.9.9.9" }
}
}'

But so far, I am unable to fetch the session id, which is supposed to
be in the same document as the VIP IP, and then use this same session
id to go fetch the username 2 log lines further, hence, in another
document if you use ES language.

jpry · February 25, 2016, 11:01am

Am I asking the impossible?

warkolm · February 26, 2016, 11:26pm

This seems to be like a join, which ES cannot do.

jpry · February 29, 2016, 10:06am

Thanks for your answer warkolm. If so, is there any other way to perform it? like using another language for scripting?

jpry · February 29, 2016, 11:05am

Isn't the sky the limit as to what kind of data one can extract from ES clusters? Isn't it possible to store the value of session_id like one would do in a normal script, and then use it further to parse once again all ES documents containing these session IDs looking for corresponding user IDs ? The whole thing only with DSL Queries and aggs?

warkolm · February 29, 2016, 9:20pm

You can do this, but you need to do it in multiple queries and then join the data external to ES.

jpry · March 1, 2016, 9:45am

Even so , I guess it is impossible to integrate all this into kibana...?

jpry · March 1, 2016, 2:05pm

I maybe found an alternative to this. By using grok filter, there is a way to consolidate multiple lines into a single event, and thus, have all lines related to a particular session id connected:

https://www.elastic.co/guide/en/logstash/current/plugins-codecs-multiline.html

That would make the connected lines belong to the same ES document, and therefore make the search with DSL Queries possible. That would be easy to implement if the indicator that proves an event field is part of a multi-line event was known in advance. But what if the indicator is a random session id that cannot be guessed? and the different log events interwined and not consecutive? I don't know yet if it is feasible.