Oracle Data into Elastic

Hi, I have been butting my head against a brick wall for over a week now trying to get the jdbc driver working in logstash, and have rethought the approach for getting data out of Oracle.

I am currently trying to prototype approaches for visualising data with kibana, and thought I am likely to have the requirement for hundreds of these jdbc configs for a final solution, all on there own schedule, which is unlikely to match the databases data generation schedule. So what would be a better thing to do than get oracle to send data directly into an elastic data store.

This would remove extraneous layers but also line up the schedules.

Are there an examples of how many this could be set up? preferably without interacting with logstash, but if that's not possible then can handle going via logstash, or some other mid tier interface.

You need to send your existing data to Elasticsearch.
That means:

  • Read from the database (SELECT * from TABLE)
  • Convert each record to a JSON Document
  • Send the json document to Elasticsearch, preferably using the _bulk API.

Logstash can help for that. But I'd recommend modifying the application layer if possible and send data to Elasticsearch in the same "transaction" as you are sending your data to the database.

I shared most of my thoughts there: Advanced Search for Your Legacy Application - -Xmx128gb -Xms128gb

Have also a look at this "live coding" recording.

if you don't want to deal with logstash then use python. I have many python code which are pulling data out from oracle,mysql,sqlserver

you can use one of these method pyodbc or cx_oracle
here are sample example. In my personal view python is the way to go. I have many logstash pipeline as well. but once I started using python I like it better.

import pyodbc
#### connect to database
cnxn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER=' + server + ';DATABASE=' + database + ';UID=' + username + ';PWD=' + password)
sql = """ select * from table """
    # Read data from SQLSearver
    df = pandas.read_sql(sql, cnxn)
    # Close sqlserver connection
    del cnxn

or you can use cx_oracle

import cx_Oracle
## make connection to oracle
 dsn_tns = cx_Oracle.makedsn(db_server, '1521', service_name=db_name)
 sql="""   select * from table """
 # Open oracle connection and read all the records
 conn = cx_Oracle.connect(user=username, password=password, dsn=dsn_tns)
 #  create cursor
 c = conn.cursor()
for row in c:
   print (row[0], row[1]) # this will print first and second column

Good artical. Thanks David