Requirement:
We are planning to use Logstash to sync data from Oracle to Elasticsearch.
There are around 25+ tables in Oracle. Plan is to to have single index in Elasticsearch with nested/flattened documents which will contain data from all associated tables.
We are evaluating if Logstash can be used - to get records from Oracle (by joining all tables), transform into nested structure and then load into Elasticsearch. Oracle tables can have one to many relationship, so corresponding data needs to be structured as list of objects in JSON document. And this structure can become very complex.
Questions:
- We would like to know if Logstash will be suitable for our requirement and will be performant enough to load large volume of data.
- How can we scale Logstash to improve transformation and indexing performance?
- If tracking column is not present in associated child tables, then how will it sync data? Will it be required to write trigger to update tracking column (may be updated date) in parent table (so that Logstash will start sync process)?