Synchonising Elastic Search with SQL server

I'm involved in a project where data is aggregated from a wide variety of datasources into an SQL Server instance. The aggregation is continuous (via a polling service), and involves large volumes of data (millions of records). A set of Elastic Search indices are populated from the SQL Server data with each index containing a graph of data from across several sql tables.. Elastic search is then used to provide fast searching of records (speed is of the essence here).

The mechanism for populating and maintaining the elastic search index is under review, and this is the reason for posting:

  • What options are there for maintaining the elastic search index in this scenario?
  • Is there anything considered as best-practice?

I realise this is somewhat of an open question, but I'm looking for suggestions or opinions on what would be worth investigating. I've read some other posts on the subject, but they suggest using "rivers" which I think have now been deprecated.

It would be great if we can get some suggestions from the community...

I've just seen a webinar on using Spark to 'join' data and load into Elasticsearch. Is this a viable option for querying data from multiple relational schemas in SQL Server, sticking it together into objects and indexing into ES.

Any suggestions and advice will be appreciated!

[Note: I'm a colleague of RikRak]

1 Like

I shared most of my thoughts there: http://david.pilato.fr/blog/2015/05/09/advanced-search-for-your-legacy-application/

Basically, I'd recommend modifying the application layer if possible and send data to elasticsearch in the same "transaction" as you are sending your data to the database.

Thanks David. I'll take a look at this.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.