Help with doing "Join"

Hi,

I saw the ES explenation to do "JOIN" in ES, but I am confused and I think I have a complicated case, so will be happy for more help.

First I will lay out my problem:
I have about 5 logs that represent 5 stages in a process (for SQL this is like having 5 tables). In all the logs I have the same UID. In all the logs I have the parameters but the parameters that irrelevant for stage one is or "0" or an empty String.

Here is a general picture:
Log a:
A: 5 , B: 0 ,C: , D:0 , E:0 , UID: xyz
Log b:
A: 0 , B: 8 ,C: , D:0 , E:0 , UID: xyz
Log c:
A: 5 , B: 0 ,C: 9 , D:0 , E:0 , UID: xyz
Log d:
A: 5 , B: 0 ,C: , D:2 , E:0 , UID: xyz
Log e:
A: 5 , B: 0 ,C: , D:0 , E:4 , UID: xyz

I want as a result:
A: 5 , B: 8, C 9, D: 2, E: 4, UID:“xyz”

It's not clear what you want to do the "join" on, can you clarify that please?

1 Like

Yes
I want to join on UID, that means at the end I want:
Joined:
A: 5 , B: 8, C 9, D: 2, E: 4, UID:"xyz"

Thanks on the reply

Ok, well the only way to use the joins that link mentions is to restructure your documents so that it's all based around the UID. We refer to that as entity centric indexing.

Otherwise you can use something like the aggregation filter in Logstash, to group things before they are sent to Elasticsearch.

There's no way to do that sort of join with your data currently looking like that and in Elasticsearch.

1 Like

The problem with the Logstash solution is that it erases all the other logs, this might be ok from my side but might be an issue. does "entity centric indexing" give the same problem?

Or alternatively is there a way to keep the 5 logs that are created and that the aggregation will only add an extra log?

You can use the Split filter in Logstash to have the originals but also an aggregated one. Entity centric indexing will also mean you don't maintain the original logs.

if you want both the original and the aggregated one, then you will need two copies of the data in Elasticsearch.

Ok, to make sure I understood:
So, basiclly my machines will send to LS the 5 logs and in LS I will have a split filter that will duplicate these logs. and on the duplicated part I will use the aggregation?

Yep!

Does this work also with "clone" filter?

Yep.

Hi @warkolm,
I tried to do as you said as following:

 filter {
    json{
        source => "message"
    }
    clone {
      clones => ["clone"]
    }
    if [type] == "clone" {
	aggregate {
		task_id => "%{transactionId}"
		code => "map['eventTimestamp'] = 0;
			 event.set(‘Aggregetion’, true)"
		push_map_as_event_on_timeout => true
		timeout_task_id_field => "transactionId"
		timeout => 100
    }
   } 
}

but from some reason nothing is shown, no duplicated logs and no aggregation :frowning: ,maybe you know why?

Could this be b/c it does deal with the idea that in log A field "a" is x and in log B field "a" is y?

Can you update that other thread with the info?
You won't get much help on Logstash configs here is all :slight_smile:

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.