Help with doing "Join"


(tomer zaks) #1

Hi,

I saw the ES explenation to do "JOIN" in ES, but I am confused and I think I have a complicated case, so will be happy for more help.

First I will lay out my problem:
I have about 5 logs that represent 5 stages in a process (for SQL this is like having 5 tables). In all the logs I have the same UID. In all the logs I have the parameters but the parameters that irrelevant for stage one is or "0" or an empty String.

Here is a general picture:
Log a:
A: 5 , B: 0 ,C: , D:0 , E:0 , UID: xyz
Log b:
A: 0 , B: 8 ,C: , D:0 , E:0 , UID: xyz
Log c:
A: 5 , B: 0 ,C: 9 , D:0 , E:0 , UID: xyz
Log d:
A: 5 , B: 0 ,C: , D:2 , E:0 , UID: xyz
Log e:
A: 5 , B: 0 ,C: , D:0 , E:4 , UID: xyz

I want as a result:
A: 5 , B: 8, C 9, D: 2, E: 4, UID:“xyz”


(Mark Walkom) #2

It's not clear what you want to do the "join" on, can you clarify that please?


(tomer zaks) #3

Yes
I want to join on UID, that means at the end I want:
Joined:
A: 5 , B: 8, C 9, D: 2, E: 4, UID:"xyz"

Thanks on the reply


(Mark Walkom) #4

Ok, well the only way to use the joins that link mentions is to restructure your documents so that it's all based around the UID. We refer to that as entity centric indexing.

Otherwise you can use something like the aggregation filter in Logstash, to group things before they are sent to Elasticsearch.

There's no way to do that sort of join with your data currently looking like that and in Elasticsearch.


(tomer zaks) #5

The problem with the Logstash solution is that it erases all the other logs, this might be ok from my side but might be an issue. does "entity centric indexing" give the same problem?

Or alternatively is there a way to keep the 5 logs that are created and that the aggregation will only add an extra log?


(Mark Walkom) #6

You can use the Split filter in Logstash to have the originals but also an aggregated one. Entity centric indexing will also mean you don't maintain the original logs.

if you want both the original and the aggregated one, then you will need two copies of the data in Elasticsearch.


(tomer zaks) #7

Ok, to make sure I understood:
So, basiclly my machines will send to LS the 5 logs and in LS I will have a split filter that will duplicate these logs. and on the duplicated part I will use the aggregation?


(Mark Walkom) #8

Yep!


For join I need split and aggregation
(tomer zaks) #9

Does this work also with "clone" filter?


(Mark Walkom) #10

Yep.


(tomer zaks) #11

Hi @warkolm,
I tried to do as you said as following:

 filter {
    json{
        source => "message"
    }
    clone {
      clones => ["clone"]
    }
    if [type] == "clone" {
	aggregate {
		task_id => "%{transactionId}"
		code => "map['eventTimestamp'] = 0;
			 event.set(‘Aggregetion’, true)"
		push_map_as_event_on_timeout => true
		timeout_task_id_field => "transactionId"
		timeout => 100
    }
   } 
}

but from some reason nothing is shown, no duplicated logs and no aggregation :frowning: ,maybe you know why?

Could this be b/c it does deal with the idea that in log A field "a" is x and in log B field "a" is y?


(Mark Walkom) #15

Can you update that other thread with the info?
You won't get much help on Logstash configs here is all :slight_smile:


(system) #16

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.