Here is the requirement that I am planning for index data modeling.
- File Info - You will get the JSON with the attributes likes Filename, unique file id,number of records in it, number of valid records, invalid records etc..
- Rejected records - These are the invalid records from the file with the attributes like record id, file id, reason for rejection etc
- Valid records - These are the valid records after making an API call in the application , it comes out with the JSON with the information like record id, file id,api call success or failure , failure reason if failed etc.
We have ignored the nested modeling - as we will get the response from the Kafka producer and not all the JSON will appear at the same time, Order of the JSON responses will be as follows
- Rejected records - record level one by one (if 10 rejected records , we will get 10 JSON from the Kafka producer)
- File Info - One JSON output
- Valid records - record level JSON
In this case appending data to the nested fields (valid records/rejected records) under File Info is not possible and also it involves re-indexing of the entire document so considering performance this approach is ignored.
The other idea we have is to keep everything in the same index having file info as the parent and rejected records, valid records as the childrens using join.
But with this approach considering the removal of mapping types from Elastic Search 6.x, if we keep all the info in the same index with the join field will there be the problem for sparsity is high? If yes , considering separate index for 3 level information how do we establish the relationship between indices? Do we have to apply join at the client side or do we have to follow a different design approach?.