Help with doing "Join"

tomer · September 6, 2017, 6:46am

Hi,

I saw the ES explenation to do "JOIN" in ES, but I am confused and I think I have a complicated case, so will be happy for more help.

First I will lay out my problem:
I have about 5 logs that represent 5 stages in a process (for SQL this is like having 5 tables). In all the logs I have the same UID. In all the logs I have the parameters but the parameters that irrelevant for stage one is or "0" or an empty String.

Here is a general picture:
Log a:
A: 5 , B: 0 ,C: , D:0 , E:0 , UID: xyz
Log b:
A: 0 , B: 8 ,C: , D:0 , E:0 , UID: xyz
Log c:
A: 5 , B: 0 ,C: 9 , D:0 , E:0 , UID: xyz
Log d:
A: 5 , B: 0 ,C: , D:2 , E:0 , UID: xyz
Log e:
A: 5 , B: 0 ,C: , D:0 , E:4 , UID: xyz

I want as a result:
A: 5 , B: 8, C 9, D: 2, E: 4, UID:“xyz”

warkolm · September 6, 2017, 7:06am

It's not clear what you want to do the "join" on, can you clarify that please?

tomer · September 6, 2017, 7:24am

Yes
I want to join on UID, that means at the end I want:
Joined:
A: 5 , B: 8, C 9, D: 2, E: 4, UID:"xyz"

Thanks on the reply

warkolm · September 6, 2017, 7:28am

Ok, well the only way to use the joins that link mentions is to restructure your documents so that it's all based around the UID. We refer to that as entity centric indexing.

Otherwise you can use something like the aggregation filter in Logstash, to group things before they are sent to Elasticsearch.

There's no way to do that sort of join with your data currently looking like that and in Elasticsearch.

tomer · September 6, 2017, 7:59am

The problem with the Logstash solution is that it erases all the other logs, this might be ok from my side but might be an issue. does "entity centric indexing" give the same problem?

Or alternatively is there a way to keep the 5 logs that are created and that the aggregation will only add an extra log?

warkolm · September 6, 2017, 8:04am

You can use the Split filter in Logstash to have the originals but also an aggregated one. Entity centric indexing will also mean you don't maintain the original logs.

if you want both the original and the aggregated one, then you will need two copies of the data in Elasticsearch.

tomer · September 6, 2017, 8:12am

Ok, to make sure I understood:
So, basiclly my machines will send to LS the 5 logs and in LS I will have a split filter that will duplicate these logs. and on the duplicated part I will use the aggregation?

warkolm · September 6, 2017, 8:12am

Yep!

tomer · September 6, 2017, 9:09am

Does this work also with "clone" filter?

warkolm · September 6, 2017, 9:15am

Yep.

tomer · September 6, 2017, 12:22pm

Hi @warkolm,
I tried to do as you said as following:

 filter {
    json{
        source => "message"
    }
    clone {
      clones => ["clone"]
    }
    if [type] == "clone" {
	aggregate {
		task_id => "%{transactionId}"
		code => "map['eventTimestamp'] = 0;
			 event.set(‘Aggregetion’, true)"
		push_map_as_event_on_timeout => true
		timeout_task_id_field => "transactionId"
		timeout => 100
    }
   } 
}

but from some reason nothing is shown, no duplicated logs and no aggregation ,maybe you know why?

Could this be b/c it does deal with the idea that in log A field "a" is x and in log B field "a" is y?

warkolm · September 12, 2017, 5:47am

Can you update that other thread with the info?
You won't get much help on Logstash configs here is all

system · October 10, 2017, 5:47am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Join data in two different logs Logstash	2	442	July 6, 2017
Can we use SQL Join in elastic query? Elasticsearch	29	1168	September 27, 2019
SQL Like join on elasticsearch - Help Elasticsearch elastic-stack-sql	7	1019	January 14, 2020
Join two index using input elasticsearch/exec plugin in logstash Logstash	1	239	June 23, 2020
Alternative for sql join in elastic stack? Elasticsearch	2	1111	November 30, 2017

Help with doing "Join"

Related topics