Hi, I have been butting my head against a brick wall for over a week now trying to get the jdbc driver working in logstash, and have rethought the approach for getting data out of Oracle.
I am currently trying to prototype approaches for visualising data with kibana, and thought I am likely to have the requirement for hundreds of these jdbc configs for a final solution, all on there own schedule, which is unlikely to match the databases data generation schedule. So what would be a better thing to do than get oracle to send data directly into an elastic data store.
This would remove extraneous layers but also line up the schedules.
Are there an examples of how many this could be set up? preferably without interacting with logstash, but if that's not possible then can handle going via logstash, or some other mid tier interface.
You need to send your existing data to Elasticsearch.
That means:
Read from the database (SELECT * from TABLE)
Convert each record to a JSON Document
Send the json document to Elasticsearch, preferably using the _bulk API.
Logstash can help for that. But I'd recommend modifying the application layer if possible and send data to Elasticsearch in the same "transaction" as you are sending your data to the database.
if you don't want to deal with logstash then use python. I have many python code which are pulling data out from oracle,mysql,sqlserver
you can use one of these method pyodbc or cx_oracle
here are sample example. In my personal view python is the way to go. I have many logstash pipeline as well. but once I started using python I like it better.
import pyodbc
#### connect to database
cnxn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER=' + server + ';DATABASE=' + database + ';UID=' + username + ';PWD=' + password)
sql = """ select * from table """
# Read data from SQLSearver
df = pandas.read_sql(sql, cnxn)
# Close sqlserver connection
del cnxn
or you can use cx_oracle
import cx_Oracle
## make connection to oracle
dsn_tns = cx_Oracle.makedsn(db_server, '1521', service_name=db_name)
sql=""" select * from table """
# Open oracle connection and read all the records
conn = cx_Oracle.connect(user=username, password=password, dsn=dsn_tns)
# create cursor
c = conn.cursor()
c.execute(sql)
for row in c:
print (row[0], row[1]) # this will print first and second column
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.