Filter aggregation and nested documents

Olivier_B · April 28, 2014, 7:57am

Hi all,

I'm working with nested documents (like millions of documents) and I do
aggregation on nested documents. And of course, I need to use filter
aggregation
(http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filter-aggregation.html),
however this does not seems to work with nested documents:

{
"aggs": {
"items": {
"nested": {
"path": "items"
},
"filter": {
"ids": {
"values": [
"2AA4CE67-9469-4AE7-AC99-46F7E2646C2F"
]
}
},
"aggs": {
"questions": {
"terms": {
"field": "items.question_label.raw",
"size": 0
}
}
}
}
}
}

Response:
Parse Failure [Found two aggregation type definitions in [items]: [nested]
and [filter]. Only one type is allowed.]]; }]

So, i tried an other way:
{
"query": {
"filtered": {
"filter": {
"ids": {
"values": [
"2AA4CE67-9469-4AE7-AC99-46F7E2646C2F"
]
}
}
}
},
"aggs": {
"items": {
"nested": {
"path": "items"
},
"aggs": {
"questions": {
"terms": {
"field": "items.question_label.raw",
"size": 0
}
}
}
}
}
}

In that case, this is working. But:

it takes several seconds,
the cache is filled up very quickly
because the cache is full, it refuses new queries (i'm using ES 1.1.1
with Circuit Breaker)
Of course, this is not acceptable for production.

So basically, i've millions of documents but i do aggregation in my example
within a single documents containing around 100 documents with 10 fields
and... it's taking 2Gb of memory for the data cache and takes several
seconds.
My guess is, the filtering is not very useful and do aggregation on all
documents before filtering (and not the contrary as I expect).

Is there any better solution for filter aggregation with nested documents?

Many thanks!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4bf1cf1d-8f4b-41f1-add1-efa952691b64%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Binh_Ly_2 · April 28, 2014, 1:55pm

You are correct. Unfortunately the fielddata is loaded for all docs
regardless of filter condition. You can:

Add more RAM
Add more nodes (and shard your index out so that RAM usage will
distributed across multiple nodes)
Use disk-based fielddata (fielddata will not be loaded into memory) for
the field/s you are aggregating on. This will run slower and you have to
reindex your data.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/14bf25b7-a973-448a-866f-425d38001d7f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Olivier_B · April 29, 2014, 1:04am

Thanks you.
OK, that's what I was fearing: the cache is loaded regardless of the filter
condition. Which is a shame, even if we filter a lot, targeting only one
document, we still need to fill up the cache!
I will try to have a lot of RAM and see if I'm reaching a stable memory
occupation and let the cache living like that.
Alternative solution is to have many indexes, each index will act as a
pre-filter and contains way less data.
Do you know if the fielddata cache is loading all docs, or only the
relevant shard? Would it help to have smaller shards?

On Monday, April 28, 2014 11:55:22 PM UTC+10, Binh Ly wrote:

You are correct. Unfortunately the fielddata is loaded for all docs
regardless of filter condition. You can:

Add more RAM

Add more nodes (and shard your index out so that RAM usage will
distributed across multiple nodes)

Use disk-based fielddata (fielddata will not be loaded into memory) for
the field/s you are aggregating on. This will run slower and you have to
reindex your data.

Elasticsearch Platform — Find real-time answers at scale | Elastic

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6a46345d-da2e-403c-8c9f-d47de4b70bac%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

x0ne_2 · July 14, 2014, 5:25pm

When fielddata is loaded, is it only that of which the aggregation job
needs (items.question_label.raw in this case) or does it load the full
_source of every match and extract the field?

On Monday, April 28, 2014 9:04:09 PM UTC-4, Olivier B wrote:

Thanks you.
OK, that's what I was fearing: the cache is loaded regardless of the
filter condition. Which is a shame, even if we filter a lot, targeting only
one document, we still need to fill up the cache!
I will try to have a lot of RAM and see if I'm reaching a stable memory
occupation and let the cache living like that.
Alternative solution is to have many indexes, each index will act as a
pre-filter and contains way less data.
Do you know if the fielddata cache is loading all docs, or only the
relevant shard? Would it help to have smaller shards?

On Monday, April 28, 2014 11:55:22 PM UTC+10, Binh Ly wrote:

You are correct. Unfortunately the fielddata is loaded for all docs
regardless of filter condition. You can:

Add more RAM

Add more nodes (and shard your index out so that RAM usage will
distributed across multiple nodes)

Use disk-based fielddata (fielddata will not be loaded into memory)
for the field/s you are aggregating on. This will run slower and you have
to reindex your data.

Elasticsearch Platform — Find real-time answers at scale | Elastic

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/352608c0-ffbe-4fbd-ab5e-9c5809137bb0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Aggregation on nested type Elasticsearch	1	302	July 6, 2017
Nested filter in aggregation query Elasticsearch	1	388	February 5, 2020
Elasticsearch aggregation doesn't work with nested-type fields Elasticsearch	4	1500	July 5, 2017
Aggregation Filtering on Multiple Nested Objets Elasticsearch	1	720	May 27, 2017
Writing aggregate with filtering Elasticsearch	5	4957	October 30, 2019

Filter aggregation and nested documents

Related topics