Elasticsearch real time data sync with sql server

I am new to Elastic Search and I am trying to figure it out below scenarios

  1. How/which tool to sync/insert data into Elastic search when new record is created in SQL Server?(Near Real time)
  2. How to transformation for legacy data into elastic search?

PS: I am using Elastic 7.8 version

If your database has a sequence or timestamp that can be used to identify new records then the jdbc input can track state.

Thanks Badger for the response. jdbc input through Logstash right?

I read that Logstash has performance issues. Is there any alternative way?

Can case 1 be resolved with beats? If yes, can you please send me a link which I can follow.

I am very new to ELK stack, please do not mind if my questions are silly

Well you posted in the logstash forum so I assumed you wanted to use logstash :smiley:

I do not think there is a beat that can do this.

The performance of Logstash depends a lot on the configuration so I disagree with your statement. I suspect Logstash is your best bet in order to get data replicated in near real time.

logstash is a good option, I have been using logstash with idbc input for about 8 months with no issues. I recommend tune your SQL query to have index in sql side and as @Badger mentioned, use a timestamp to track deltas change after last execution, something that worked for me is use a limited time range for my where condition, i.e. get only rows modified in the last 10 mins

WHERE ....
AND DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), GETDATE()), *my-modify-date-time-sql-column*) >=  DATEADD(mi,-10,GETDATE())

I have a sync close to real time, logstash pipeline running every 5 mins.

good luck

Thank you @Badger, @Christian_Dahlqvist, @christiancj. I will work on POC using logstash for any insert/update or delete in table.

For scenario 2, where I have to transform all data(millions of records) to docs. For this do you recommend any tool or should I loop through all records and create docs?

Thanks again All.

@Coder_Cub use Logstash for your 'migration' load, leaving the where condition of your sql input statement open to pull ALL the millions of docs. I've migrated close to 4M of sql rows to elasticsearch in couple of hours using Logstash, the time may vary if you have a transformation process in logstash (mutate, grok, etc..) and analyzer(s) in your elastic index, and use a second pipeline to catch your deltas (new inserts/updates)

As tip, use a unique Id column from SQL to define your document Id in ElasticIndex, this is the value that logstash will use to find the object in Elastic and update if the doc already exist (thinking on update scenarios).

i.e.

output {
	elasticsearch {
		hosts => ["server1:9200","server2:9200"]
		document_id => "%{your_sql_unique_identifier_column}"
	
		index => "my_index_sample"
		doc_as_upsert => true
		action => "update"
	}
}

Awesome. Thank you so much @christiancj

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.