We are building a search engine with Elastic, the source data is on a relational DB. We need to ingest data from the DB to Elasticsearch. We know it is doable using JDBC Input Plugin on Logstash to ingest data to ELK. Are there any other options? I'm thinking to use Python?
BTW, this is self-managed ELK stack not Elastic Cloud.
You need to send your existing data to elasticsearch.
That means:
Read from the database (SELECT * from TABLE)
Convert each record to a JSON Document
Send the json document to elasticsearch, preferably using the _bulk API.
Logstash can help for that. But I'd recommend modifying the application layer if possible and send data to elasticsearch in the same "transaction" as you are sending your data to the database.
I have started using logstash in beginning. but on later stage I have started liking python. and most all my pipeline now converted to python.
python gives me more control on transformation of data then logstash.
it all depends on what database you use and what are your use cases are.
for example logstash was little harder to learn for me as it sometime requires grok pattern and all new filters. but there is a document in elk which are very good. I have a logstash code with 400 lines.
While python was easy for me to learn. you can repet the function that you use in one pipeline to second without copying to code.
The DB would be Sql Server....
I will check out the elastic documents using phython to ingest data from RDBMS, it seemed that I saw one but it was for ELK on Elastic cloud... I will check again.
what you looking for? how to make connection to sql server using python?
### Now read from SQLSErver and process and write back to ELK
server = 'tcp:<servername>'
database = '<db_name>'
username = '<username>'
password = '<password>'
cnxn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER=' + server + ';DATABASE=' + database + ';UID=' + username + ';PWD=' + password)
your_sql = """ select * from your_table_name """
# Read data from SQLSearver to dataframe
df = pandas.read_sql(your_sql, cnxn)
Many thanks.. indeed. I'm trying to find the way using Python Library to read data from SQL DB to ELK and compare it with using JDBC on Logstash.
Thank you very much for your information, it really helps a lot..
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.