Best practices for related and hierarchical data


I'm new to elastic search and am grappling with how best to structure or not structure my model.

For Example
1000's of users belong to a few hundred parent users and I want to record an indefinite amount of activity of all users.

I'm guessing I'd do something like this:

* Index_name/Users: {
        activity: [{ activity_name,
                          time}, {}, {}, {} .... ] 

Does this look correct? I keep the hierarchy info 'flat' in each user with the account and parent_account but I keep each users activity as nested objects. Thinking ahead would it be easy to get info of all users with the same parent account and aggregate their activity?

I'm also tempted to do something like

index_name/users: { .... }
index_name/activity: { ... }

and try link activities to users that way some how?

Relationship best practices
(Nik Everett) #2

Elasticsearch has a couple of ways to do this, all of which aren't perfect.

  • You manually flatten all the data into the right documents. This is often the best choice as it places the least overhead on Elasticsearch.
  • Parent/Child uses routing tricks to make very special purpose query time joins possible. It has non-trivial overhead but it is possible.
  • Nested "unrolls" array fields into multiple documents, flattening the parent document into the child document.

Parent/Child vs Nested is basically a query complexity vs an index time and space hit. Manually flattening is going to function similar to nested except you'll understand exactly what is going on because you did it explicitly. Honestly I prefer that way of doing things because it doesn't restrict the data layout in the shards.

You can't create two indexes and link them. Er, there is a lookup feature on the terms query, but it really doesn't work as a general purpose join mechanism. You should have a look at it too, but generally the way to solve things like this is to flatten the all the parent information into the activity.


Thanks for the reply, I found it helpful and it confirmed much of my thought process. (phew I'm not that confused after all!) I'll generate some test data and try experiment a bit.

I'm not exactly getting what nesting does but I guess the out come is similar to flattening parent fields into activity.
I understand that it indexes nested object as separate documents under the hood even though I'd see and think of it as an array object.

(Nik Everett) #4

Right! I tend to like to handle these types of things myself in the upstream application. That way Elasticsearch can scatter the activities to all of the shards. You have to manage changes in the parent object getting pushed to the child objects - but you have the option of just not making those changes and having the new value come through on new activities if that is the right thing to do. It is a silver lining, I guess.

(system) #5