How to Correlate Data?

Dear All

Are there ways to mimic somehow the following SQL statement in Kibana / ES?

// SELECT * FROM network_table
// WHERE connection_id IN (
// SELECT connection_id FROM connection_table
// WHERE user_id = "kiefer"
// )

I'm trying to get a visualization of documents that could be joined to other documents by a specified field (connection_id).

Any hints are highly appreciated.

Best regards
Christoph

That is not possible in Kibana and it's also not really possible in Elasticsearch. The general approach for a search engine like Elasticsearch would be to store your data in a denormalized form, instead of normalized forms, that you use in relational database systems, so have the network_table instead of a connection_id have the nested connection object in there.

Cheers,
Tim

You can also find some additional information in the Elasticsearch docs about joining.

Dear Tim

Thanks a lot for your reply. I appreciate it a lot. Also, I am big fan of your blog posts.

I am trying to interpret your answer. First, you say that this is not possible in Kibana / ES unless one implements a denormalization step to bring the connection_id field from the "connection_table document" to the "network_table document". Is that right? Would that happen somewhere in Logstash?What do you suggest?

In your second answer, you post a link to ES docs about joining. Does that mean that it's nevertheless possible to do the joining in ES (Kibana?). Can we do it without changing our current mapping? Do you know of an any other example to demonstrate that?

More generally, what is Elastic's answer when a customer wants to correlate the data from various sources (as it happens naturally in the RDBMS world)?

Best regards
Christoph

Hi,

sorry yeah that answers where a bit confusing. Lemme clarify on that.

Elasticsearch has some build in mechanisms for SOME kind of joins, that are described in the above linked documentation. Nevertheless the general advice is: denormalize your data.

Also Kibana itself does not have support for any of those join possibilities ES offers, so you won't be able to visualize on it. We have nested aggregation support on our roadmap, but the nested fields imho doesn't solve the issue you describe above.

So in the case of visualizing that documents, you should write the user_id into the actual network_table inside a connection field, so you could query on connection.user_id to check for that, instead of writing it in two indices.

Hope that could clarify that a bit.

More generally, what is Elastic's answer when a customer wants to correlate the data from various sources (as it happens naturally in the RDBMS world)?

Denormalizing it :slight_smile:

Cheers,
Tim

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.