I'm still in the early stages of discovering ElasticSearch so I'll surely make some bogus assertions or assumptions here, please do not hesitate to correct me.
What I understand so far is that data is fed into ElasticSearch via a HTTP(s) PUT request and then I can use ES and Kibana to do some searches and display the results in very nice graphs.
I have a dataset here which takes quite a lot of disk space. It is saved in a custom format that is basically a huge table of values. Each column has an associated metadata telling its name, its datatype...
The original code reading/writing this was written in Delphi but has been ported in Java for use in a Scala package that presents it as a RDD for Apache Spark to use as a data source.
I could import this dataset into ES via bulk import or lots of PUT requests, but I suspect it would take quite a lot of time and use a lot of disk space. And that import would have to be repeated each time the source dataset gets changed.
I was thus wondering if I could write some sort of a plugin that would allow ES to directly read that data instead of importing it.
Is this possible? Are there any drawbacks with that approach?
Thanks for any pointers on that subject.