8.3 Memory calculations and nested fields

Hi,

Trying to better understanding the new memory planning for 8.3.

Do nested fields count just like other fields?

My indices have nested fields. So a single index including its nested fields has 100 fields. According to the above documentation, if I have 1000 indices this means 1000 (indices) x 100 (fields) x1kb=0.1GB heap (is this correct?)

  1. Do nested fields count just like other fields in the index?
  2. Just to be on the safe side: as of 8.3 the number of shards is no longer relevant for memory calculations?
  3. Is there no importance to the amount of data itself within the index? In other words, if there are 2 indices with the exact mapping (e.g. rollover), and one index has 100M records and the other index just has 1 single record - there is no change in memory calculations?

I believe nested fields count like normal fields, but the simplest answer is to upgrade to 8.5 which reports the overhead size in the node stats:

There's still some amount of per-shard overhead but in most setups it's not worth considering.

Likewise, there's still some amount of per-segment overhead but it rarely matters.

Thanks David. I can see now that the documentation section "Data nodes should have at least 1kB of heap per field per index, plus overheads" from 8.3 & 8.4 has been replaced with "Allow enough heap for field mappers and overheads".

What I am trying to do is to plan for upgrading from 7.x to latest 8.5. So I cannot run these APIs to get answers. This is problematic as I am trying to plan ahead the memory requirements of the cluster in the long run. Planning by number of expected indices, fields etc. is possible. But since this is now dropped from the documentation it is problematic to rely on what the API returns if you don't have 8.5 already set up (sorry if I am missing something).

Some questions:

  1. If I use the same mapping on some test 8.5 cluster would the total_deduplicated_mapping_size be identical after I upgrade the production to 8.5? i.e. is this totally mapping related?
  2. For the node stats this is more problematic as it is per node. How can I get the expected total_estimated_overhead before I upgrade to 8.5? Does this number relate in anyway to the number of indices? I reckon that the more indices the larger this overhead becomes - is this correct?

Thanks.

If you have enough memory in 7.x then you will be fine in 8.x. The guidance has been updated in 8.x versions because of some significant reductions in heap usage in recent versions. I would suggest doing the upgrade first without changing the size of your cluster, and once the upgrade is complete you can start to measure things and think about reducing your cluster size.

Nevertheless, is the 8.3/8.4 documentation re. the number of indices and memory calcluations still relevant and correct in 8.5?

The 8.3/8.4 docs do not take account of mapping deduplication, so the memory usage in that area in 8.5 should be no worse (and will often be better).

Likewise, 8.5 still allows 1kiB per mapped field (see these docs) but this may well improve in future versions.

Thanks!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.