General advice on sucking in whole databases into Elastic?

Hi all. I'm working on pulling my company's relational data into Elastic using the Logstash JDBC. It's working, but I have a general question.

Currently, I've just written a JOIN over two tables containing a few fields we most care about. I can imagine writing more of those. But do people sometimes decide to just suck in the entire database? Like, one mega-JOIN?

Ours usually contain around 50 tables, with thousands--not millions--of rows. I'm not asking about costs. I'm asking if others have found this is sometimes useful and practical. I can see do analytics on it we would otherwise never have imagined.

Thanks!

It really depends on the use case, what will you do with data in Elasticsearch? Why are you putting this data in Elasticsearch?

Elasticsearch does not support join, so if you need the all the data for some specific entity then you will maybe need to join everything before sending it to Elasticsearch.

But if it will be useful or practical is something that only your use case can answer.

That was quick!

:slight_smile:

First, I know I need to do the JOIN. That's easy with the Logstash JDBC. I want to put it ALL into ELK to be able to look for UNEXPECTED patterns.

I've mostly been learning the Anomaly Detection recently. (It inspires awe, literally.) So I want to be able to look for anomalies in our existing databases, and do so quickly and in great volume. Like, be able to track the variance in our top 20 most important fields, and look for many potential influencers of them.

I'm sure lots of people have had the same idea. I'm trying to find out if any of them decided to pull lots of tables all at once.

Thank you, Leandro!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.