Running aggregations on two different nested objects


(Jean-Noël Rivasseau) #1

Hello,

I just started using ElasticSearch 1.0.1. I am trying to find the ideal
data model and query for my exact needs, which I will explain below (I
changed just the terms of the data model corresponding to my real use case,
in order to see if I was able to formulate it differently, which was
useful).

I am indexing documents corresponding to BookedStay. A BookedStay has a
nested array (named places) containing map objects corresponding to visited
places during the stay. An object has an id (place id) and a category
corresponding to the time of day of the visited place. A BookedStay then
has a second nested array, corresponding to the amenities used during the
stay. The objects in the array have an id (of type string) and a count.

So a BookedStay can be represented as : {"date": 03/03/2014,
"placesVisited": [{"id": 3, "category": "MORNING"}, {"id": 5,
"category": "AFTERNOON"}, {"id": 7, "category": "EVENING"}], "amenities":
[{"amenityId": "restaurant", "count": 3}, {"amenityId": "dvdPlayer",
"count": 1}] }

What I want to run is a query over a given room number, and find for
all BookedStay that have this given place number in their places array, an
aggregate over all amenities used, per time of day.

This amounts to finding, for all documents that have a place id of (for
instance) 5, the number of times the restaurant was used, or the dvd player
in the lounge, broken down by time of day. The ultimate goal is to
understand better how the visit of a place in a given time of day affects
the services sold by the hotel.

I am unable to achieve this query, as when I run a first nested aggregate
over the category, I cannot nest the second one over the amenities as it is
in the "parent" document. Is it possible to do that? In that case, how do I
specify that the nested aggregation will take place over the parent object
of the current aggregation?

Here is a tentative query with the Java driver (obviously not working,
because of the above problem):

SearchRequestBuilder srb =
elasticSearchService.getClient().prepareSearch("test_index").setSearchType(SearchType.COUNT).setTypes("test_stay").setQuery(QueryBuilders.nestedQuery("placesVisited",
QueryBuilders.termQuery("id", 5)))
.addAggregation(AggregationBuilders.nested("nestedPlaceVisited").path("placesVisited")
.subAggregation(AggregationBuilders.filter("currentPlaceFilter").filter(FilterBuilders.termFilter("id",
5))
.subAggregation(AggregationBuilders.terms("countPerTimeCategory").field("category")
.subAggregation(AggregationBuilders.nested("nestedAmenities").path("amenities")
// HERE THIS subaggregation should run over the original document... and I
dont know how to achieve that

subAggregation(AggregationBuilders.terms("amenitiesUsed").field("amenities.amenityId"))

Thanks for your help over this difficult problem! If it's not possible with
"parent aggregations", how should I refactor my data model?

Jean-Noel

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e8a752fc-0a96-437d-b071-4009c0f39d33%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Adrien Grand) #2

Hi,

The aggregation doesn't work because today, when you enter the context of a
nested field in an aggregation, it is not possible to escape it. I don't
think there is an easy way to modify your data model in order to work
around this issue, however this is an issue that we plan to fix in the
future (not in the upcoming 1.1 release however, rather in a few months).

On Wed, Mar 19, 2014 at 10:00 AM, Jean-Noël Rivasseau elvanor@gmail.comwrote:

Hello,

I just started using ElasticSearch 1.0.1. I am trying to find the ideal
data model and query for my exact needs, which I will explain below (I
changed just the terms of the data model corresponding to my real use case,
in order to see if I was able to formulate it differently, which was
useful).

I am indexing documents corresponding to BookedStay. A BookedStay has a
nested array (named places) containing map objects corresponding to visited
places during the stay. An object has an id (place id) and a category
corresponding to the time of day of the visited place. A BookedStay then
has a second nested array, corresponding to the amenities used during the
stay. The objects in the array have an id (of type string) and a count.

So a BookedStay can be represented as : {"date": 03/03/2014,
"placesVisited": [{"id": 3, "category": "MORNING"}, {"id": 5,
"category": "AFTERNOON"}, {"id": 7, "category": "EVENING"}], "amenities":
[{"amenityId": "restaurant", "count": 3}, {"amenityId": "dvdPlayer",
"count": 1}] }

What I want to run is a query over a given room number, and find for
all BookedStay that have this given place number in their places array, an
aggregate over all amenities used, per time of day.

This amounts to finding, for all documents that have a place id of (for
instance) 5, the number of times the restaurant was used, or the dvd player
in the lounge, broken down by time of day. The ultimate goal is to
understand better how the visit of a place in a given time of day affects
the services sold by the hotel.

I am unable to achieve this query, as when I run a first nested aggregate
over the category, I cannot nest the second one over the amenities as it is
in the "parent" document. Is it possible to do that? In that case, how do I
specify that the nested aggregation will take place over the parent object
of the current aggregation?

Here is a tentative query with the Java driver (obviously not working,
because of the above problem):

SearchRequestBuilder srb =
elasticSearchService.getClient().prepareSearch("test_index").setSearchType(SearchType.COUNT).setTypes("test_stay").setQuery(QueryBuilders.nestedQuery("placesVisited",
QueryBuilders.termQuery("id", 5)))

.addAggregation(AggregationBuilders.nested("nestedPlaceVisited").path("placesVisited")
.subAggregation(AggregationBuilders.filter("currentPlaceFilter").filter(FilterBuilders.termFilter("id",
5))

.subAggregation(AggregationBuilders.terms("countPerTimeCategory").field("category")
.subAggregation(AggregationBuilders.nested("nestedAmenities").path("amenities")
// HERE THIS subaggregation should run over the original document... and I
dont know how to achieve that

subAggregation(AggregationBuilders.terms("amenitiesUsed").field("amenities.amenityId"))

Thanks for your help over this difficult problem! If it's not possible
with "parent aggregations", how should I refactor my data model?

Jean-Noel

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e8a752fc-0a96-437d-b071-4009c0f39d33%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/e8a752fc-0a96-437d-b071-4009c0f39d33%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6JhVsi6tr2r54sFxnAPpip7gzhvmvX1hje1ueynPr50w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Jean-Noël Rivasseau) #3

Hello,

Thanks for your reply. What do you mean by "not possible to escape it" ?
Could you provide a sample code in Java, that would work if the necessary
changes would be implemented?

Jean-Noel

On Thursday, March 20, 2014 1:51:54 PM UTC+4, Adrien Grand wrote:

Hi,

The aggregation doesn't work because today, when you enter the context of
a nested field in an aggregation, it is not possible to escape it. I don't
think there is an easy way to modify your data model in order to work
around this issue, however this is an issue that we plan to fix in the
future (not in the upcoming 1.1 release however, rather in a few months).

On Wed, Mar 19, 2014 at 10:00 AM, Jean-Noël Rivasseau <elv...@gmail.com<javascript:>

wrote:

Hello,

I just started using ElasticSearch 1.0.1. I am trying to find the ideal
data model and query for my exact needs, which I will explain below (I
changed just the terms of the data model corresponding to my real use case,
in order to see if I was able to formulate it differently, which was
useful).

I am indexing documents corresponding to BookedStay. A BookedStay has a
nested array (named places) containing map objects corresponding to visited
places during the stay. An object has an id (place id) and a category
corresponding to the time of day of the visited place. A BookedStay then
has a second nested array, corresponding to the amenities used during the
stay. The objects in the array have an id (of type string) and a count.

So a BookedStay can be represented as : {"date": 03/03/2014,
"placesVisited": [{"id": 3, "category": "MORNING"}, {"id": 5,
"category": "AFTERNOON"}, {"id": 7, "category": "EVENING"}], "amenities":
[{"amenityId": "restaurant", "count": 3}, {"amenityId": "dvdPlayer",
"count": 1}] }

What I want to run is a query over a given room number, and find for
all BookedStay that have this given place number in their places array, an
aggregate over all amenities used, per time of day.

This amounts to finding, for all documents that have a place id of (for
instance) 5, the number of times the restaurant was used, or the dvd player
in the lounge, broken down by time of day. The ultimate goal is to
understand better how the visit of a place in a given time of day affects
the services sold by the hotel.

I am unable to achieve this query, as when I run a first nested aggregate
over the category, I cannot nest the second one over the amenities as it is
in the "parent" document. Is it possible to do that? In that case, how do I
specify that the nested aggregation will take place over the parent object
of the current aggregation?

Here is a tentative query with the Java driver (obviously not working,
because of the above problem):

SearchRequestBuilder srb =
elasticSearchService.getClient().prepareSearch("test_index").setSearchType(SearchType.COUNT).setTypes("test_stay").setQuery(QueryBuilders.nestedQuery("placesVisited",
QueryBuilders.termQuery("id", 5)))

.addAggregation(AggregationBuilders.nested("nestedPlaceVisited").path("placesVisited")
.subAggregation(AggregationBuilders.filter("currentPlaceFilter").filter(FilterBuilders.termFilter("id",
5))

.subAggregation(AggregationBuilders.terms("countPerTimeCategory").field("category")
.subAggregation(AggregationBuilders.nested("nestedAmenities").path("amenities")
// HERE THIS subaggregation should run over the original document... and I
dont know how to achieve that

subAggregation(AggregationBuilders.terms("amenitiesUsed").field("amenities.amenityId"))

Thanks for your help over this difficult problem! If it's not possible
with "parent aggregations", how should I refactor my data model?

Jean-Noel

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e8a752fc-0a96-437d-b071-4009c0f39d33%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/e8a752fc-0a96-437d-b071-4009c0f39d33%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c79aba35-5c3b-4356-bff2-687378fb4261%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Adrien Grand) #4

On Fri, Mar 21, 2014 at 8:15 AM, Jean-Noël Rivasseau elvanor@gmail.comwrote:

Thanks for your reply. What do you mean by "not possible to escape it" ?
Could you provide a sample code in Java, that would work if the necessary
changes would be implemented?

The nested field mapper stores data as separate Lucene documents. What the
nested aggregation does, is that for every incoming (parent) document ID,
it is going to call sub aggregations with the document ID of children
documents. The sub aggregations are not aware that they are being applied
to child documents, to them it doesn't make any difference, they just do
their usual stuff, but on different doc IDs.

What would be needed in order to make your aggregation work would be to
have another aggregation that would be able to translate these child doc
IDs back to their parent's doc ID, which is something that we don't have
today.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j61pgEgKHLjM4nuuG%2Bg_rfM6r4bZjZ0ia%2B46vO7bvtgUg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #5