Howdy,
We've got an Elasticsearch 0.90.2 instance index where we're trying to use
a grandparent -> parent -> child mapping. The ratio of documents is roughly
1:10:100 and the total number is around 25,000,000. We've found that the
parent ID cache is unexpectedly large for this use case. The _id field is a
32 character string so naively we'd expect the ID cache to be about 2 GB.
Right now the statistics are showing it closer to 13 GB and growing. The
entire index itself is only 22 GB so having the ID cache be this big is
odd. Even stranger, nothing is actually doing a has_child/has_parent query
yet with the child mappings so I wouldn't expect the cache to even be
populated.
A lot of the child documents are short lived which may be having an impact
as IDs are not reused.
There's a "reuse" flag in the SimpleIdCache module source that seems like
it might help but it defaults to off. Does anyone know exactly what that
flag does?
Also the ID cache doesn't seem to be clearable. Calling
_cache/clear?id_cache=true through the REST API doesn't seem to actually
clear the cache, at least according to the statistics. Then again, I'm not
sure I trust the ID cache size stats as currently it's indicating the ID
cache is nearly the entire heap allocation of 16 GB.
Has anyone else experienced this? Is there something I'm misreading here? I
know there's a performance improvement for has_child queries in 0.90.3 but
we're mostly expecting to use has_parent.
Cheers,
Dan
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.