I'm working on a little project that indexes log data (using one index per
day). For now I just have one ES node for testing doing the indexing at a
rather low volume (up to 1 million messages / day). While monitoring the
CPU consumption of that search node I found that it was slowly but overall
significantly increasing over the course of the 'indexing day'. I assume
this is due to the fact that the index size increases making certain
operations more expensive over time. I used the hot thread api and found
that index merge operations seem to be consuming that CPU which brings me
to the actual question:
What would be the recommended merge policy for this special log data set.
Its different from normal content in that it never changes (i.e. no
updates) and never gets deleted (only a whole index gets deleted). There
are only 'new' documents being added at a rather constant rate. I guess
there could be special merge settings for this type of traffic.