We are trying to load data from oracle database to elasticsearch using logstash. We have created a config file with input section having the driver info, connection to db etc. The data is extracted using select query and the output points to the data nodes of the elastic cluster.
After doing the config updates, we are now trying to get maximum data loading speed. But the max we could get was 25 GB per hour.
Do you have any advise on what to be tuned, where to look at for finding the bottleneck?
We are using elastic and logstash 7.1.1.
Logstash workers = 12 (16 cpus) and pipeline batch size is 64000. JVM is 64G on 128 G server.
Elastic has 7 data nodes. Each having jvm 32G off 64 G machine.
For verifing that, I loaded the oracle table output to another elasticsearch first. Then used logstasgh to load from that elasticsearch to this one, the speed remains the same, 25 G per hour.
Have you looked at disk I/O, iowait and network performance to see if any of those seem to be limiting throughput? How many indices and shards are you actively indexing into?
Logstash is able to load data from elastic cluster 1 to cluster 2 in 1 GB per min speed. Its only when we pull data from oracle database, the loading is slow.
We also observed that logstah spends some time processing data where teh cpu/meme goes up in the logstash's server, and after a while the data load takes place.
So can we assume that when the data gets "formatted" from oracle database's form to elasticsearch form, the process takes time?
How quickly are you able to load from Oracle if you e.g. just write data directly to file instead? This should give an indication of the extraction speed from Oracle.
If you can partition the query you might be able to run several jdbc input plugins in parallel, which might improve performance. It could also be that it is the database and not JDBC that is the limiting factor.
If you have any faster way to get the results of the query out to disk you could try using a file input plugin instead together with this and see if that speeds things up.
I exported a huge amount of data to csv and then uploaded to elasticsearch. The performance is acceptable. The export to csv was taking long time though.
Are you aware of any other way in which we can load data from oracle database to elasticsearch?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.