How to Correlate Data?

Christoph_Kiefer · April 18, 2018, 2:02pm

Dear All

Are there ways to mimic somehow the following SQL statement in Kibana / ES?

// SELECT * FROM network_table
// WHERE connection_id IN (
// SELECT connection_id FROM connection_table
// WHERE user_id = "kiefer"
// )

I'm trying to get a visualization of documents that could be joined to other documents by a specified field (connection_id).

Any hints are highly appreciated.

Best regards
Christoph

timroes · April 18, 2018, 3:02pm

That is not possible in Kibana and it's also not really possible in Elasticsearch. The general approach for a search engine like Elasticsearch would be to store your data in a denormalized form, instead of normalized forms, that you use in relational database systems, so have the network_table instead of a connection_id have the nested connection object in there.

Cheers,
Tim

timroes · April 18, 2018, 3:02pm

You can also find some additional information in the Elasticsearch docs about joining.

Christoph_Kiefer · April 25, 2018, 1:25pm

Dear Tim

Thanks a lot for your reply. I appreciate it a lot. Also, I am big fan of your blog posts.

I am trying to interpret your answer. First, you say that this is not possible in Kibana / ES unless one implements a denormalization step to bring the connection_id field from the "connection_table document" to the "network_table document". Is that right? Would that happen somewhere in Logstash?What do you suggest?

In your second answer, you post a link to ES docs about joining. Does that mean that it's nevertheless possible to do the joining in ES (Kibana?). Can we do it without changing our current mapping? Do you know of an any other example to demonstrate that?

More generally, what is Elastic's answer when a customer wants to correlate the data from various sources (as it happens naturally in the RDBMS world)?

Best regards
Christoph

timroes · April 27, 2018, 10:52am

Hi,

sorry yeah that answers where a bit confusing. Lemme clarify on that.

Elasticsearch has some build in mechanisms for SOME kind of joins, that are described in the above linked documentation. Nevertheless the general advice is: denormalize your data.

Also Kibana itself does not have support for any of those join possibilities ES offers, so you won't be able to visualize on it. We have nested aggregation support on our roadmap, but the nested fields imho doesn't solve the issue you describe above.

So in the case of visualizing that documents, you should write the user_id into the actual network_table inside a connection field, so you could query on connection.user_id to check for that, instead of writing it in two indices.

Hope that could clarify that a bit.

More generally, what is Elastic's answer when a customer wants to correlate the data from various sources (as it happens naturally in the RDBMS world)?

Denormalizing it

Cheers,
Tim

system · May 25, 2018, 10:52am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Datastructure relational data Elasticsearch	8	458	June 27, 2020
Data correlation kibana Kibana	3	2854	December 27, 2017
Workaround for 'JOIN' in Kibana/ES Elasticsearch	2	608	November 30, 2017
Corelate fields of different index patterns in kibana Kibana	5	907	July 13, 2017
Correlate field Elasticsearch	4	1235	May 18, 2017

How to Correlate Data?

Related topics