Id_cache is growing at index time, without warmers or queries being run


(Emilie Lavigne) #1

I am using 0.90.7 and am surprised to see that indexing a document with a
parent document automatically increases the id_cache by about 150b.

Ie: if I run:

curl -XPOST localhost:9200/avalanche-parent-child/metadata/?parent=NA -d
'{"uuid":"123456789", "content": "some content"}'

And then look at the id_cache size (using curl
localhost:9200/_nodes/stats/indices?pretty), I automatically see the
id_cache size go up by about 150b even if no has_child/has_parent queries
or warmers were run.

Two questions:

  • What gets stored in the id_cache in present of parent-child mappings?
  • Is there a way to prevent an automatic refresh of the cache after
    indexing a child document?
  • When a query with has_child/has_parent is run, are all child->parent
    mappings loaded into the id_cache, or just the ones that relate to the
    query ran?
  • How long is the default expiry for the id_cache? Is there a way to
    change it?

My use case: I have multiple documents that are in reality orphans. When
that is the case, they are added with a referenced non-existent parent
called "NA". I am hoping I can prevent these documents from taking up
space into the id_cache since we do not have sufficient cache space to
contain all _ids into the id_cache.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/28efba96-2683-49b3-a70e-8b9239451254%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Martijn Van Groningen) #2

Hi Emilie,

The id_cache is a lookup from Lucene docId (int) to parent uid (string) for
both parent and child docs. Entries of the id_cache are coupled to a part
of a Lucene index (segment) and once a segment gets cleaned up also the
enty in the id_cache gets cleaned up and this happens during merging of
segments. There is no time based expiration. Also the id_cache contains all
the parent ids visible from the moment a refresh happens and never a
selection for a specific has_child query.

Since version 0.90.7 (
https://github.com/elasticsearch/elasticsearch/issues/3819) we
automatically update / create the id cache during the refresh. The reason
behind this that it will prevent multiple first search request from doing
the id_cache initialisation (double work).

The id_cache needs to be there for parent/child searchers, so it is better
to create / update the id_cache in a controlled manner during the refresh.
If you don't want this behaviour you can either disable the automatic
refresh or disable the automatic warming that happens during a refresh. For
the later you can use the index.warmer.enabled option and set it to
false, but this also disable eager field data loading.

Having a NA id will reduce the size the id_cache needs for each document,
but each document with this NA id needs some space in the id_cache.

I hope this explanation somewhat helps you to understand the parent/child
in ES!

Martijn

On 27 November 2013 17:43, Emilie Lavigne emilie.lavigne@gmail.com wrote:

I am using 0.90.7 and am surprised to see that indexing a document with a
parent document automatically increases the id_cache by about 150b.

Ie: if I run:

curl -XPOST localhost:9200/avalanche-parent-child/metadata/?parent=NA -d
'{"uuid":"123456789", "content": "some content"}'

And then look at the id_cache size (using curl
localhost:9200/_nodes/stats/indices?pretty), I automatically see the
id_cache size go up by about 150b even if no has_child/has_parent queries
or warmers were run.

Two questions:

  • What gets stored in the id_cache in present of parent-child mappings?
  • Is there a way to prevent an automatic refresh of the cache after
    indexing a child document?
  • When a query with has_child/has_parent is run, are all child->parent
    mappings loaded into the id_cache, or just the ones that relate to the
    query ran?
  • How long is the default expiry for the id_cache? Is there a way to
    change it?

My use case: I have multiple documents that are in reality orphans. When
that is the case, they are added with a referenced non-existent parent
called "NA". I am hoping I can prevent these documents from taking up
space into the id_cache since we do not have sufficient cache space to
contain all _ids into the id_cache.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/28efba96-2683-49b3-a70e-8b9239451254%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
Met vriendelijke groet,

Martijn van Groningen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2BA76TwD1T6j5E8g13Lps%3DX2Bz-uS-ULDj6sqGG-wmuSSXkzbg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #3