Design Validation

Hello,

Hierarchy:

parent {
	child {
		grandchild {
		
			grandgrandchild {
				....grandgrandgrandchildren
			}
			grandgrandchild {
				....grandgrandgrandchild
			}
		}
	}
}
User {

}

Rules:
User having access of partiuclar entity, will have automatic access to underlaying sub-entities

Non-super Users can be associated with grandgrandchild only

Super-users can be associated with parent

Access Patterns:

Search User in Parent by name

Get Users list by Parent

Get Users list by grandgrandchild

Search grandchild in child by name

Search grandgrandchild in grandchild by name

Get grandchild list by child

Get grandgrandchild list by grandchild

Index Design:

Creating indexes based on parentId and all children, grandchildren, grandgrandchildren, grandgrandgrandchild will be part of this index in linear fashion.
If there is a new parent, there will be a new index with same layout and so on.

I am duplicating Ids in each record. for example:
Get grandchild list by child:
each grandchild document will have field of "childId" and "entityType" like:

	grandchild {
		...matadata
		childId: "<id>" // will be parent for this grandchild
		entityType: "<grandchild>"
	
	}
query: Get <parentId>/_search
{
  "query": {
    "query_string": {
      "fields": [
        "childId",
        "entityType"
      ],
      "query": "<childId> AND grandchild"
    }
  }
}

Nature of Application
Application is read intensive

Questions:

  1. Is this design good? It is inspired by single index design, I know it is not 100% fulfilling that but I tried to implement this.
  2. How can I achieve Get Users list by grandgrandchild because there is many to many relation and in Elasticsearch there is no such thing to resolve many to many relation as per my knowledge.
    one possible solution I am thinking of is to keep array of userIds in each grandgrandchild or vice versa but I am not sure about the impact on performance because there will be 2 queries:
    i. First query will get me all users ids (array) for particular grandgrandchild which i will supply to second query as input.
    ii. Second query will get information information of each users based on the array.

You have not provided any information about how often or frequently you will be adding, removing or updating content for different types of objects, which is an important consideration. Another consideration is how many fields the objects at different levels are likely to have.

This type of deeply nested structures are IMHO rarely optimal and there are a few common issues with this approach:

  • Deeply nested documents get very expensive to make changes to as all individual objects are stored separately behind the scenes and they are all updated/reindexed for any change to any part of the document.
  • Retrieving very large documents can get expensive and slow.
  • Queries will quickly get quite complex and can also consume a lot of resources and get slow.

I have seen this kind of approach before when users have tried to maintain some kind of relational model, which is hard as Elasticsearch is not relational. It is often much better to denormalize and flatten the structure, e.g. break each individual object out into a separate document and store the relevant fields from levels above on it.

Thank you for the feedback. I have updated my question; application is read intensive. At the moment I am only considering above mentioned access patterns.

Yes, you are right; the structure is deeply nested but as you mentioned I am storing data in flat hierarchy that is in linear fashion. There will be no nesting within documents. Each parent will have different index and every document related to that parent will be in that index as a separate linear document whether it is a child, grandchild and so on...

Max number of fields I can think of right now is not more than 15.

I do not understand. Can you show some sample documents?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.