I'm currently trying to find a way to efficiently migrate all my data from MSSQL database to Elastisearch.
Right now, my data in MSSQL is normalized to 4 different tables however, I'm planning on using the following logic to denormalized my data.
Let's assume I have 4 different tables and all have a foreign key. Now, if I have 4 different jobs that
pushes the data from each table to my ES index, I can end up with a completely denormalized data.
For example, consider the ES index type called product
Table 1 ==> Job1 Insert or update the document with id 1 (4 fields) version 1
Table 2 ==> Job2 Update the same document with id 1 (now 4 + 5 = 9 fields) version 2
Table 3 ==> Job3 Update the same document with id 1 (now 9 + 3 = 12 fields) version 3
Table 4 ==> Job4 Update the same document with id 1 (now 12 + 6 = 18 fields) version 4
At the end, I will end up with 4 updates on the same document.
My question now, is this going to drastically affect my performance if I do a bulk UPSERT ?
Is this recommended or perhaps I would be better off to just join the tables before pushing it to ES ?
Any thought on this ?