ES 7 join datatype aggregations

franck.lefebure · November 16, 2017, 3:04pm

Hi,
I've seen (in this blog post for example : https://www.elastic.co/blog/index-type-parent-child-join-now-future-in-elasticsearch) that has_parent/has_child relationships (and types) will be deprecated in favour of the new join datatype.

We make an heavy use of theses parent/child relations and children aggregations.
For our use cases we are missing, and are dreaming a "parent", or "reverse-child" aggregation.

Can we expect a "reverse-join" aggregation in the future versions ?

Franck

jimczi · November 24, 2017, 9:30am

We could but this requires to understand why you'd need such aggregation, can you describe the use case that you want to solve ?

franck.lefebure · November 26, 2017, 11:23pm

Bonjour Jim,

Well, to describe the usecase, I have to describe some aspects of our platform
I will try to be as concise as I can

We at www.softbridge.fr, are trying to marry process mining with full-text-search realtime analysis.
Our platform starts with the L of ELK, but the data fetched by Logstash are going to a custom stack whose purpose is to find correlations and precedencies in this data.
In output of our stack, the datas are now something like a collection of directed acyclic graph.
At the root of the graph, we have the "case". Then we have a recursive structure of "activity" (yes we use the BPMN conventions)
The case is very equivalent to the root activity (we call it the top-activity).
Each activity has a reference and a parent-reference, and a bunch of nested metadatas and quality and performance indicators.

The cases/activities are then indexed in a ES cluster.

With our frontend (we don't use the K of ELK due to the poor support of nested and parent/child relations) , we can search and scroll the cases and the activities entities.
The cases and activities are 2 types of a same index. the activities are linked to the case with a parent/child relation.

Our users can identify a case with criterias on the activities.
For example they can "search the C1 type cases whose A1 type activities last more than x time"
They can also make aggregation, for example "histogram of the lasting of a A1 activity for the C1 type case" or "histogram of the lasting of A1 activities for the cases that have a A2 activity"

What would be interesting for us is to do cascading aggregations in the activities hierarchy.
So we should start with some aggregation on an activity type. Then we would like to aggregate on another kind of activity.

For us it means "study one characteristic of the A1 type activities then see it's impact on the A2 type activities"
If the activities were a modelised as a nested structure on the cases, we should do a reverse-nested agg to the case followed with a nested back to the activities followed with some filter.
But we use a parent/child relation, mainly because we have to search and scroll on the "activity" type, and we can't if the activities are nested entities (may be you will learn me that there is a way, in fact ! ).
(A workaround would be to duplicate the activities both as children and as nested, but it double our volumetry...)
Another beloved features should be to access parent/children entities in scripts

So that is some parts of what we do.

FYI, we are thinking about showing our work at the SF Elastic{ON} 2018

Franck

jimczi · December 5, 2017, 2:33pm

Sorry for the late reply. Accessing parent or children in aggregations is costly. For every children in the aggregations we'd have to retrieve the parent in the entire shard so the performance of such aggregations would suffer from this retrieval. In general we don't recommend using parent/child unless you have a use case that needs to update children very frequently or because the number of child per parent is too big to fit in a single document . nested type is much more powerful when it comes to aggregations and search so I'd advise to switch to this field type and to create another index that contains all possible activities for scrolling purpose.

system · January 2, 2018, 2:33pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Aggregations involving parent-child are too slow in ES 6.7 Elasticsearch	2	717	May 23, 2019
"nested" aggregation for parent/child relations? Elasticsearch	4	389	July 6, 2017
Parent-Child relationship using join Elasticsearch	1	608	December 12, 2017
Aggregation on parent/child documents Elasticsearch	3	742	July 6, 2017
Support for parent aggregation (the equivalent of reverse_nested) on roadmap Elasticsearch	1	375	July 6, 2017

ES 7 join datatype aggregations

Related topics