I'm looking for strategies people have used to index highly normalized data from MS SQL Server or any relational database. This isn't a problem unique to SQL Server.
My current approach was to write a complex stored procedure. It joined multiple tables and and used
STUFF to handle one-to-many relationships and put rows from other tables into a single cell.
The results of the stored procedure would get parsed by my application and put into the format I want in my Elasticsearch index. I would use the NEST API and bulk insert the records.
This is cumbersome and feels inefficient, even running it on a beefy server. The stored procedure takes modified date parameters so I'm not pulling entire tables into memory.
Rivers are deprecated (and I wasn't using the JDBC river in the first place). They seemed to come with a set of other problems anyway. Logstash doesn't have a SQL Server input yet and I don't know if it's planned. My Elasticsearch index includes a lot of nested objects. So it's not a flat structure.
I was wondering how other people might have solved this problem or if you had any suggestions for dealing with normalized data. I'd like to still maintain the relationships (some of them) that I have in my relational database. Nested Objects seem to be better performing (according to the ES docs) than the Parent/Child model in Elasticsearch, though I haven't run any tests.