Ingest data from Relational DB to Elasticsearch

Hello there:

We are building a search engine with Elastic, the source data is on a relational DB. We need to ingest data from the DB to Elasticsearch. We know it is doable using JDBC Input Plugin on Logstash to ingest data to ELK. Are there any other options? I'm thinking to use Python?
BTW, this is self-managed ELK stack not Elastic Cloud.

Please advise

Thank you in advance

Li

You need to send your existing data to elasticsearch.
That means:

  • Read from the database (SELECT * from TABLE)
  • Convert each record to a JSON Document
  • Send the json document to elasticsearch, preferably using the _bulk API.

Logstash can help for that. But I'd recommend modifying the application layer if possible and send data to elasticsearch in the same "transaction" as you are sending your data to the database.

I shared most of my thoughts there: https://david.pilato.fr/blog/2015-05-09-advanced-search-for-your-legacy-application/

Have also a look at this "live coding" recording.

2 Likes

Thanks a lot...

I have started using logstash in beginning. but on later stage I have started liking python. and most all my pipeline now converted to python.
python gives me more control on transformation of data then logstash.

Hi Sachin,

Could you elaborate a little bit on how to? Would you please share some documents if possible?

Thanks

Li

it all depends on what database you use and what are your use cases are.

for example logstash was little harder to learn for me as it sometime requires grok pattern and all new filters. but there is a document in elk which are very good. I have a logstash code with 400 lines.

While python was easy for me to learn. you can repet the function that you use in one pipeline to second without copying to code.

The DB would be Sql Server....
I will check out the elastic documents using phython to ingest data from RDBMS, it seemed that I saw one but it was for ELK on Elastic cloud... I will check again.

Thanks

what you looking for? how to make connection to sql server using python?

    ### Now read from SQLSErver and process and write back to ELK
    server = 'tcp:<servername>'
    database = '<db_name>'
    username = '<username>'
    password = '<password>'

    cnxn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER=' + server + ';DATABASE=' + database + ';UID=' + username + ';PWD=' + password)

    your_sql = """ select * from your_table_name """
    # Read data from SQLSearver to dataframe
    df = pandas.read_sql(your_sql, cnxn)

and this is how you make connection to elk, this is for elasticsearch 7.x python library

es = Elasticsearch(elastic_hostnames, http_auth=(elastic_admin_user, elastic_admin_passwd), port=9200,
                           sniff_on_start=True,
                           sniff_on_connection_fail=True,
                           sniffer_timeout=3600,
                           timeout=3600, max_retries=4, retry_on_timeout=True
                           )

this is how you use bulk write. where mylist is ( list of dictionaries)

pb = helpers.bulk(es, mylist)

Hi Sachin,

Many thanks.. indeed. I'm trying to find the way using Python Library to read data from SQL DB to ELK and compare it with using JDBC on Logstash.
Thank you very much for your information, it really helps a lot..

Li

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.