I am trying to figured out the best practice to solve following case:
I have two data tables in sql db: transactions and bank_accounts (one transaction has one bank account). And i would like to search thought the information of that tables by elastic search.
In my cases i need to find:
All bank accounts contains the name.
All transactions contains the bank name.
All transaction contains the description.
So if i do store a bank account as nested data in every transaction i will get a lot of transactions with same nested data and when bank change the name (typo from manager for example) I will have to update all transactions containing the names of this bank, which looks not very optimised.
As well in that case i need to have two types: bank_accounts and transactions. In this case the transaction will contains duplicated information from bank accounts.
So, general question is how to structure the data to get most relatable solution for my cases.
I am very new in elastic search, so i hope the community to help me find the information to solve the case.
A common way to do this is as you mention to flatten the model and store the account data you need to search on with the transaction. If the account updates this will mean a number of documents need to be updated. If this type of updates are rare or infrequent this is often an acceptable trade-off as updating a few thousand documents in Elasticsearch is quite quick and efficient. If updates however are frequent it may however not be the ideal solution.
By flattening you gain simpler and faster queries at the expense of more expensive updates of account data. It is always a trade-off so will depend on the access and update patterns of your data.
That requires you to structure your data using parent-child relationships.If your parent entity updates frequently it is a great choice, but it comes with overhead and limitations and results in more complex queries so do not jump to this just because this somewhat mimics your relational structure.
So if i have for example 100m transactions evenly distributed between 100 bank accounts does that meant that better to put transactions as nested element into bank account or vice versa?
Transaction mostly updated three times per life cycle (change status), bank accounts changes rarely and randomly mostly because of typo.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.