Tuning logstash and elasticsearch for loading data from oracle database

Hi Team,

We are trying to load data from oracle database to elasticsearch using logstash. We have created a config file with input section having the driver info, connection to db etc. The data is extracted using select query and the output points to the data nodes of the elastic cluster.

After doing the config updates, we are now trying to get maximum data loading speed. But the max we could get was 25 GB per hour.

Do you have any advise on what to be tuned, where to look at for finding the bottleneck?

We are using elastic and logstash 7.1.1.

Logstash workers = 12 (16 cpus) and pipeline batch size is 64000. JVM is 64G on 128 G server.
Elastic has 7 data nodes. Each having jvm 32G off 64 G machine.

Thank You in Advance
Aby.

How have you verified that it is Elasticsearch that is the bottleneck and not the Oracle input?

Christian,

For verifing that, I loaded the oracle table output to another elasticsearch first. Then used logstasgh to load from that elasticsearch to this one, the speed remains the same, 25 G per hour.

Thank You for the response.

Aby.

What type of storage do you have? What type of networking?

The storage is a non ssd disk and VLAN for network. Is that what you are looking for?

Have you looked at disk I/O, iowait and network performance to see if any of those seem to be limiting throughput? How many indices and shards are you actively indexing into?

Yes Christian. We had done monitoring on the nodes. The network traffic and the i/o looks okay.

For testing purpose I have taken only one index. It has 7 shards. One on each of the data node.

Logstash is able to load data from elastic cluster 1 to cluster 2 in 1 GB per min speed. Its only when we pull data from oracle database, the loading is slow.

We also observed that logstah spends some time processing data where teh cpu/meme goes up in the logstash's server, and after a while the data load takes place.

So can we assume that when the data gets "formatted" from oracle database's form to elasticsearch form, the process takes time?

How quickly are you able to load from Oracle if you e.g. just write data directly to file instead? This should give an indication of the extraction speed from Oracle.

Yes, tested just now.

If I extract the data trough a sql developer (which uses jdbc driver) it takes 7 mins for 1GB. I think the bottleneck is jdbc driver.

Is using jdbc driver for oracle database the right way or we have some other method?

If you can partition the query you might be able to run several jdbc input plugins in parallel, which might improve performance. It could also be that it is the database and not JDBC that is the limiting factor.

If you have any faster way to get the results of the query out to disk you could try using a file input plugin instead together with this and see if that speeds things up.

Yes Yes. Let me try that. Thank You Christian.

Christian,

I exported a huge amount of data to csv and then uploaded to elasticsearch. The performance is acceptable. The export to csv was taking long time though.

Are you aware of any other way in which we can load data from oracle database to elasticsearch?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.