Allow me to explain my predicament and hopefully you all will be able to enlighten me
I'm completely new to the Elastic Stack and over the last couple of days/weeks have been investigating the possibilities of what I might be able to do with (what so far seems) this superb product stack.
In my day to day job I tend to work a lot with a commercial ITSM/Process Workflow tool which of course generates a lot of data and is very flexible but does not have any really great reporting/dashboarding/visualization options.
The main reporting option is currently Crystal Reports which in itself is fine but can get pretty complex and cumbersome to maintain. Also for business users designing their own reports is pretty much impossible.
For a dashboarding/visualization option the tool relies on some build in functionality that relies on scripting/coding to visualize the KPI data in some nice graphs.
Hence my search for something more 21th century and this is where my eye fell on Kibana!
With Kibana comes of course ElasticSearch as a source and Logstash to make some sense of all the data.
Now while this explains why I have a need what is my issue? (I seem to have this all figured out don't I ).
The main problem with the tool is getting the data out of it in to logstash.
There is no direct SQL access to the backend database (we can export periodically a subset in our own format so that's an alternative)
The only available real-time entry point is a SOAP based webservice.
Another (but I don't like this that much) is a period dump to a specific file format (xml, csv,xls,..) but this might be a last possible fallback option through perhaps filebeat?
I did find the http_poller plugin (http_poller documentation) but it's unclear to me if this can work with SOAP (as I need to pass an XML request and that doesn't seem to be one of the possible parameters)
With this in mind, does anybody have some suggestions on what I should set up, and especially what plugins (or beats) I should use to either read from the (near-time) reporting database or what would be even better, a way to periodically perform a SOAP request to the tool endpoint and then parse the xml response in to ElasticSearch.
Hopefully some of you more experienced minds can help me out or at least send me towards some more documentation/videos to review that point me in the right direction.
Thanks already in advance for reading this thrilling story and I hope to become a member of the ElasticStack userbase!
The most difficult issue IMO is to create structured JSON complex documents based on flatten data as you have in RDBMS as rows.
Exporting to CSV has the same "issue" IMO.
So if your data are really flat, like you can build a document with a single SQL call, then using the tools above will work really nicely.
If your system also update records or delete records, then it become more complex. May be you are lucky and have somewhere a technical table which tells you what has been removed?
If not, you can say: ok. I'm going to reindex every day the full database from scratch...
One of the challenge is that it will be far away from real time but might work for your use case.
If you have a complex dataset, I'd recommend writing your own injection process. Read the database as you wish, aggregate data, do some transformations on the fly, enrich data with another database....
Then send the results to Elasticsearch.
Sadly reverse engineering the "true" back-end database will be impossible.
Well, with some time you can get the structure and ER-Model out out of it but the tool gets updated multiple times a year and since the database is considered black-box they have the ability to simply modify their entire database structure as they see fit.
That's also why we tend to use (at this moment) a separate reporting database. The tool itself has a mechanism that allows us to export the data that we want (without knowing the structure in the source database) in to a target database with a structure of our choosing.
I will look in to the JDBC Input as that could be a solution if we need to go with option 1 (near-time SQL dump).
With regards to updating/deleting, the system can indeed do that but we have multiple Identifiers available to see what has changed and could even just export changed records if this would be needed.
At the moment this is also the way we try to set up some more flexible reporting/dashboarding.
ITSM Tool -> Export to Custom SQL Db -> Visualize through some other Reporting tool (mainly through ODBC and yes a lot of Excel )
As you can guess this is not the most ideal situation as it has the possibility of a significant delay in updating the data.
Luckily we have a lot of control over what we export and when and we usually can get our exports down to a few minutes if need be.
Because of this however I was hoping for a SOAP oriented solution as this gives me access to the real-time data and avoids another point of failure in the chain (eliminating the intermediate sql db)
So while I'm hoping for
ITSM Tool's SOAP WS -> LogStash -> ElasticSearch -> Kibana
I might have to go for
ITSM Tool -> Export to Custom SQL Db -> LogStash -> ElasticSearch -> Kibana
(and perhaps avoid LogStash with Elastic Stack 5.0 and go directly to ES with Ingest)
Pretty much what you describe at the end:
Still hoping somebody has some idea about a Webservice Consumer type of shipper
Depends what you mean by "documents", my export target can be a RDBMS but also (single) xml file, excel file, csv file,..
I cannot export each object in my source db to a separate document, well I could but that would be highly inefficient..
Not out of the box but I can configure triggers "on modification" that performs an action, this action can be a script which would be HTTP request.
What is available Out of the Box is a SOAP based webservice that interacts directly with the source database, this webservice can be called from outside the tool and deliver me an xml response (as a webservice does).
So in this case I was hoping Beats/LogStash could do a periodical pre-defined webservice call to my existing Webservice and parse the xml response.
Whether or not it only returns changed records can be configured but I believe I should be able to simply update my index (on ES) by providing a uid?
I still get a feeling that easiest/less complex way would be to take the JDBC Input driver and perform a query on a (near-time up to date) view of the database.
If only I had a REST based webservice that would make it much easier
Perhaps in the long run I need to think about developing my own plugin that can call a SOAP WS or some middleware/ETL that can transform the XML Response in to a JSON that ES/LS can easily interpret...
Yes it might..
Seems to have all the options I need.
I dismissed it earlier since I didn't see (also because I had plugin documentation overload I think) how I could pass the request XML in the input.
I overlooked the line " # Supports all options supported by ruby's Manticore HTTP client" in the urls section.
Seems like that might be a way to encapsulate the headers/endpoint and xml request in one hash?
Time to dive in to the Manticore HTTP Client documentation and see how I can pass on the XML Request..