Need help to evaluate Logstash for syncing Oracle data into Elasticsearch (combining multiple 25+ table records into single nested document)

Tejas_Damle · December 20, 2021, 6:46am

Requirement:
We are planning to use Logstash to sync data from Oracle to Elasticsearch.
There are around 25+ tables in Oracle. Plan is to to have single index in Elasticsearch with nested/flattened documents which will contain data from all associated tables.
We are evaluating if Logstash can be used - to get records from Oracle (by joining all tables), transform into nested structure and then load into Elasticsearch. Oracle tables can have one to many relationship, so corresponding data needs to be structured as list of objects in JSON document. And this structure can become very complex.
Questions:

We would like to know if Logstash will be suitable for our requirement and will be performant enough to load large volume of data.
How can we scale Logstash to improve transformation and indexing performance?
If tracking column is not present in associated child tables, then how will it sync data? Will it be required to write trigger to update tracking column (may be updated date) in parent table (so that Logstash will start sync process)?

Alex_Marquardt · December 20, 2021, 9:38am

You could make use of Elastic's update, upsert, or scripted upsert functionality to combine data from different sources into a single destination.

Examples of scripted upserts executed by Logstash can be found in the following blogs:

Using Logstash and Elasticsearch to calculate transaction duration in a microservices architecture
Using Logstash and Elasticsearch scripted upserts to transform eCommerce purchasing data

In conjunction with scripted upserts it may be necessary to iterate over all fields in the parameters that are passed in. You may find the follow article useful for guidance on how to do this in painless:

Using Elasticsearch Painless scripting to recursively iterate through JSON fields.

You may also find the following article to be relevant:

How to keep Elasticsearch synced with a RDBMS using Logstash and JDBC | Elastic Blog

You may also wish to review the following comment about search-time operations:

Tune for search speed | Elasticsearch Guide [master] | Elastic

system · January 17, 2022, 9:39am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
JDBC to ElasticSearch: duplicate records? Logstash	2	3634	November 8, 2018
Nested Json Array creation by Joining many Tables from Oracle Logstash	3	701	August 7, 2017
Logstash cannot handle huge amount of data Logstash	6	1191	February 4, 2019
Multi-table JDBC Support Logstash	3	1198	July 6, 2017
Migrating 3 millions of records from RDBMS to Elastic Search using logstash Logstash	8	1363	August 14, 2020

Need help to evaluate Logstash for syncing Oracle data into Elasticsearch (combining multiple 25+ table records into single nested document)

Related topics