because the cache is full, it refuses new queries (i'm using ES 1.1.1
with Circuit Breaker)
Of course, this is not acceptable for production.
So basically, i've millions of documents but i do aggregation in my example
within a single documents containing around 100 documents with 10 fields
and... it's taking 2Gb of memory for the data cache and takes several
seconds.
My guess is, the filtering is not very useful and do aggregation on all
documents before filtering (and not the contrary as I expect).
Is there any better solution for filter aggregation with nested documents?
You are correct. Unfortunately the fielddata is loaded for all docs
regardless of filter condition. You can:
Add more RAM
Add more nodes (and shard your index out so that RAM usage will
distributed across multiple nodes)
Use disk-based fielddata (fielddata will not be loaded into memory) for
the field/s you are aggregating on. This will run slower and you have to
reindex your data.
Thanks you.
OK, that's what I was fearing: the cache is loaded regardless of the filter
condition. Which is a shame, even if we filter a lot, targeting only one
document, we still need to fill up the cache!
I will try to have a lot of RAM and see if I'm reaching a stable memory
occupation and let the cache living like that.
Alternative solution is to have many indexes, each index will act as a
pre-filter and contains way less data.
Do you know if the fielddata cache is loading all docs, or only the
relevant shard? Would it help to have smaller shards?
On Monday, April 28, 2014 11:55:22 PM UTC+10, Binh Ly wrote:
You are correct. Unfortunately the fielddata is loaded for all docs
regardless of filter condition. You can:
Add more RAM
Add more nodes (and shard your index out so that RAM usage will
distributed across multiple nodes)
Use disk-based fielddata (fielddata will not be loaded into memory) for
the field/s you are aggregating on. This will run slower and you have to
reindex your data.
When fielddata is loaded, is it only that of which the aggregation job
needs (items.question_label.raw in this case) or does it load the full
_source of every match and extract the field?
On Monday, April 28, 2014 9:04:09 PM UTC-4, Olivier B wrote:
Thanks you.
OK, that's what I was fearing: the cache is loaded regardless of the
filter condition. Which is a shame, even if we filter a lot, targeting only
one document, we still need to fill up the cache!
I will try to have a lot of RAM and see if I'm reaching a stable memory
occupation and let the cache living like that.
Alternative solution is to have many indexes, each index will act as a
pre-filter and contains way less data.
Do you know if the fielddata cache is loading all docs, or only the
relevant shard? Would it help to have smaller shards?
On Monday, April 28, 2014 11:55:22 PM UTC+10, Binh Ly wrote:
You are correct. Unfortunately the fielddata is loaded for all docs
regardless of filter condition. You can:
Add more RAM
Add more nodes (and shard your index out so that RAM usage will
distributed across multiple nodes)
Use disk-based fielddata (fielddata will not be loaded into memory)
for the field/s you are aggregating on. This will run slower and you have
to reindex your data.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.