Configuring index behavior between insert phase and query phase

Hi all
We are struggling how to configure our index behavior between 3 phases it needs to cope with bravely (number of nodes,number of shards,merging segments, sorting the index, and size limitations and any other parameters that may help us :wink: ) :

Our app consists with 3 main phases for elastic usages:
1.initial massive insert :

  • 2.5 Billion docs / 10 days (3000 docs / sec)
  • the docs have Parent-Child relations ( avg of 1.4 children per parent)
  • the children contains nested fields, which may vary from 1 to 1M short-text values per nested field (e.g. nested document)
  • avg doc size (including the nested docs from the nested fields) is 1kb
  1. massive(ish) update :
  • 15% of the documents (children) are updated (by inserting nested fields values - e.g. some more nested docs , but small and homogeneous ones )
  • these updates are based on several (limited) aggregation queries
  1. user free queries:
  • each query the user requests produces ~15 additional aggregation queries on the results (this sucks , we know)

step 1-3 happens every x-days (never starting before step 2 is completed) , though expected data is much smaller (max ~10% of the initial number of documents - e.g. max 250M)

So... basically we are looking for a night in shiny armor to come and rescue us,
and if not,
than at least some good references , because the web lacks of documentations about how to cope with different index's usage phases.

Milkana and Nirit

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.