we are doing performance testing of various types of environments to decide on a properly sized set of nodes for a number of Elasticsearch Cluster installations.
When doing long-term indexing testing, we see a sharp drop of indexing performance in the beginning, stabilizing after some time. I.e. writing performance is up to 5 times higher initially compared to after 6 hours.
Typically after something like 6-8 hours performance becomes more or less stable and does not decrease a lot any more.
See the following screenshot, the blue bars are the number of documents written per time-frame, the initial performance is quite good, but quickly drops.
We mostly use Elasticsearch 2.1.1, and we see this pretty much across the board, from 1 node machines with 16GB RAM and 4 cores up to 6-node clusters with 32GB and 8 cores.
We did apply common performance optimizations mentioned in the documentation and we are using bulk indexing with 5k/5MB bulks usually.
When doing performance analysis, we see the system is mostly CPU bound, probably because the documents are large (1.5k bytes JSON aprox.) and include a few nested documents.
See this screenshot from dynaTrace: It shows the CPU usage on the left side and some Elasticsearch-specific metrics in the middle, there does not seem to be any throttling kicking in, networking is not an issue as this test even ran indexing from the same machine, read-operations are constant, writing operations go down with the number of documents that are written.
Is this something that is expected because of segment merging kicking in only after some time? However 6 hours seems a bit long when writing a few thousand documents per second.
Or do we run into some configuration-limit on indexing which throttles down the rate of indexing? Or Thread pools?