Hi all. I'm working on pulling my company's relational data into Elastic using the Logstash JDBC. It's working, but I have a general question.
Currently, I've just written a JOIN over two tables containing a few fields we most care about. I can imagine writing more of those. But do people sometimes decide to just suck in the entire database? Like, one mega-JOIN?
Ours usually contain around 50 tables, with thousands--not millions--of rows. I'm not asking about costs. I'm asking if others have found this is sometimes useful and practical. I can see do analytics on it we would otherwise never have imagined.
It really depends on the use case, what will you do with data in Elasticsearch? Why are you putting this data in Elasticsearch?
Elasticsearch does not support join, so if you need the all the data for some specific entity then you will maybe need to join everything before sending it to Elasticsearch.
But if it will be useful or practical is something that only your use case can answer.
First, I know I need to do the JOIN. That's easy with the Logstash JDBC. I want to put it ALL into ELK to be able to look for UNEXPECTED patterns.
I've mostly been learning the Anomaly Detection recently. (It inspires awe, literally.) So I want to be able to look for anomalies in our existing databases, and do so quickly and in great volume. Like, be able to track the variance in our top 20 most important fields, and look for many potential influencers of them.
I'm sure lots of people have had the same idea. I'm trying to find out if any of them decided to pull lots of tables all at once.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.