Hi Emilie,
The id_cache is a lookup from Lucene docId (int) to parent uid (string) for
both parent and child docs. Entries of the id_cache are coupled to a part
of a Lucene index (segment) and once a segment gets cleaned up also the
enty in the id_cache gets cleaned up and this happens during merging of
segments. There is no time based expiration. Also the id_cache contains all
the parent ids visible from the moment a refresh happens and never a
selection for a specific has_child query.
Since version 0.90.7 (
Better warm-up of merged segments · Issue #3819 · elastic/elasticsearch · GitHub) we
automatically update / create the id cache during the refresh. The reason
behind this that it will prevent multiple first search request from doing
the id_cache initialisation (double work).
The id_cache needs to be there for parent/child searchers, so it is better
to create / update the id_cache in a controlled manner during the refresh.
If you don't want this behaviour you can either disable the automatic
refresh or disable the automatic warming that happens during a refresh. For
the later you can use the index.warmer.enabled
option and set it to
false, but this also disable eager field data loading.
Having a NA id will reduce the size the id_cache needs for each document,
but each document with this NA
id needs some space in the id_cache.
I hope this explanation somewhat helps you to understand the parent/child
in ES!
Martijn
On 27 November 2013 17:43, Emilie Lavigne emilie.lavigne@gmail.com wrote:
I am using 0.90.7 and am surprised to see that indexing a document with a
parent document automatically increases the id_cache by about 150b.
Ie: if I run:
curl -XPOST localhost:9200/avalanche-parent-child/metadata/?parent=NA -d
'{"uuid":"123456789", "content": "some content"}'
And then look at the id_cache size (using curl
localhost:9200/_nodes/stats/indices?pretty), I automatically see the
id_cache size go up by about 150b even if no has_child/has_parent queries
or warmers were run.
Two questions:
- What gets stored in the id_cache in present of parent-child mappings?
- Is there a way to prevent an automatic refresh of the cache after
indexing a child document?
- When a query with has_child/has_parent is run, are all child->parent
mappings loaded into the id_cache, or just the ones that relate to the
query ran?
- How long is the default expiry for the id_cache? Is there a way to
change it?
My use case: I have multiple documents that are in reality orphans. When
that is the case, they are added with a referenced non-existent parent
called "NA". I am hoping I can prevent these documents from taking up
space into the id_cache since we do not have sufficient cache space to
contain all _ids into the id_cache.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/28efba96-2683-49b3-a70e-8b9239451254%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.
--
Met vriendelijke groet,
Martijn van Groningen
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2BA76TwD1T6j5E8g13Lps%3DX2Bz-uS-ULDj6sqGG-wmuSSXkzbg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.