Elasticsearch real time data sync with sql server

Coder_Cub · August 12, 2020, 5:11pm

I am new to Elastic Search and I am trying to figure it out below scenarios

How/which tool to sync/insert data into Elastic search when new record is created in SQL Server?(Near Real time)
How to transformation for legacy data into elastic search?

PS: I am using Elastic 7.8 version

Badger · August 12, 2020, 5:31pm

If your database has a sequence or timestamp that can be used to identify new records then the jdbc input can track state.

Coder_Cub · August 12, 2020, 5:56pm

Thanks Badger for the response. jdbc input through Logstash right?

I read that Logstash has performance issues. Is there any alternative way?

Can case 1 be resolved with beats? If yes, can you please send me a link which I can follow.

I am very new to ELK stack, please do not mind if my questions are silly

Badger · August 12, 2020, 6:05pm

Well you posted in the logstash forum so I assumed you wanted to use logstash

I do not think there is a beat that can do this.

Christian_Dahlqvist · August 12, 2020, 6:31pm

The performance of Logstash depends a lot on the configuration so I disagree with your statement. I suspect Logstash is your best bet in order to get data replicated in near real time.

christiancj · August 13, 2020, 12:53pm

logstash is a good option, I have been using logstash with idbc input for about 8 months with no issues. I recommend tune your SQL query to have index in sql side and as @Badger mentioned, use a timestamp to track deltas change after last execution, something that worked for me is use a limited time range for my where condition, i.e. get only rows modified in the last 10 mins

WHERE ....
AND DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), GETDATE()), *my-modify-date-time-sql-column*) >=  DATEADD(mi,-10,GETDATE())

I have a sync close to real time, logstash pipeline running every 5 mins.

good luck

Coder_Cub · August 13, 2020, 6:32pm

Thank you @Badger, @Christian_Dahlqvist, @christiancj. I will work on POC using logstash for any insert/update or delete in table.

For scenario 2, where I have to transform all data(millions of records) to docs. For this do you recommend any tool or should I loop through all records and create docs?

Thanks again All.

christiancj · August 13, 2020, 7:28pm

@Coder_Cub use Logstash for your 'migration' load, leaving the where condition of your sql input statement open to pull ALL the millions of docs. I've migrated close to 4M of sql rows to elasticsearch in couple of hours using Logstash, the time may vary if you have a transformation process in logstash (mutate, grok, etc..) and analyzer(s) in your elastic index, and use a second pipeline to catch your deltas (new inserts/updates)

As tip, use a unique Id column from SQL to define your document Id in ElasticIndex, this is the value that logstash will use to find the object in Elastic and update if the doc already exist (thinking on update scenarios).

i.e.

output {
	elasticsearch {
		hosts => ["server1:9200","server2:9200"]
		document_id => "%{your_sql_unique_identifier_column}"
	
		index => "my_index_sample"
		doc_as_upsert => true
		action => "update"
	}
}

Coder_Cub · August 16, 2020, 3:05am

Awesome. Thank you so much @christiancj

system · September 13, 2020, 3:05am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Importing Data from SQL to ElasticSearch? Elasticsearch	4	7356	July 5, 2017
How to sync SQL Server data with ES indexes? Elasticsearch	2	4933	July 5, 2017
ElasticSearch to SQL sync Elasticsearch	8	634	July 6, 2017
SQL Server stream into Elastic Search Elasticsearch	8	3964	July 5, 2017
SQL Server Sync with Elasticsearch through Logstash - retransfer does not happens Logstash	2	3838	October 22, 2018

Elasticsearch real time data sync with sql server

Related topics