Parent-child v/s multiple indexes - Elasticsearch 6.0+

Nithin_Chandy · May 14, 2018, 1:28am

We currently have a ES 5.3 cluster which uses parent-child mapping as our data follows this model. Parent-child mapping was the best option as it supported cluster-side joins and provided performance benefits from being stored in the same shard.

Migration from 5.3 to 6.2 involves breaking changes as multiple mapping types are no longer supported from ES v6.0 onwards.

Wondering if it's worth sticking to parent-child data model in newer versions of elasticsearch using join data types.

Our data at most have parent-child document count ratio of 1:400. As of now, we have nearly 1 million parent docs. Both parent and child types are read-write heavy.

From what i know, there are three possible options

1. Retain parent-child mapping using the new ' join' data type.

Pros

Cluster-side joins
Performance benefits
Can still use has_parent, has_child queries

Cons

Flat mapping schema (gets ugly if the parent/child documents have lot of properties)
Difficult to identify the doc type just by looking at the document

2. Create separate indexes for each existing mapping type

Pros

Mapping looks much cleaner
Easy to identify the type of document just by looking at it
Can make use of routing to put all related documents (for example, all child related to same parent) in the same shard with in the child index.
Each index can have different shard configuration (flexibility)
Offers some index level optimization. Parent-Child doc count ratio is around 1:400. We can configure parent index to have less number of shards to keep the overall number of shards low.

Cons

Requires a common field in both indexes to maintain relationship
Cannot use has_parent, has_child queries anymore. Most of our queries will need to hit both index and need two queries to complete the task. We can optimize this a little bit by denormalizing data but that will end up in data duplication.
Requires application-side joins
Child index can get very big compared to parent index
Multiple queries to perform the join can increase the overall latency

3. Single index but with a custom field to define type of the document

Pros

No need to use join type
Can make use of custom routing to put related documents in the same shard

Cons

Mapping looks complicated
Index can get very big as all the documents will reside in the same index

I'm more leaned towards the second option of having multiple indices. The main downside i see is that the application-side joins can be expensive. At the same time, elastic search documentation says has_parent, has_child queries are also expensive.

The other advantage of parent-child documents of being stored in the same shard can be achieved to some extent by having individual indices use routing and hence achieve data locality with in the respective indices.

Would like to know if there's any other performance/scaling aspect i need to consider. Also, can someone comment on the latency impact of each of this approach? Thanks.

system · June 11, 2018, 1:35am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Advice on Using Joins Elasticsearch	3	387	July 6, 2020
ES 6 : nested document vs the new join type Elasticsearch	1	2278	March 8, 2018
How to use parent child relationships between types residing in different index in ElasticSearch 6.5 Elasticsearch	3	1462	February 16, 2019
Has_parent Elasticsearch	5	1335	July 5, 2017
Multiple types using new join datatype Elasticsearch	1	477	September 6, 2018

Parent-child v/s multiple indexes - Elasticsearch 6.0+

Related topics