we're dealing with a performance issue with elastic 1.5.2 (but we also had it on 1.5.1 and preceding versions).
Our ES is formed by a single node instance on a Xeon with 12 HT cores and 64Gb of RAM; we plan to setup an ES cluster later this year. We configured both $ES_MIN_MEM and $ES_MAX_MEM with 32g. We also made these changes to the default configuration file:
threadpool.search.type: fixed threadpool.search.size: 24 threadpool.search.queue_size: 200000 bootstrap.mlockall: true http.cors.enabled: true script.disable_dynamic: false
We started our tests on an index with one shard, no replicas and only one mapping with 400k docs (each with 10 subdocs) like this one:
indexed with the _id as a concatenation of task_id and chain_id; each of the 400k docs has a different "user_id" value.
Our target is to retrieve documents related to a set of users.
We cannot use a MultiGet to retrieve the docs related to a set of users (since the _id of the docs is not the user_id) and so we use a TermsFilter (we of course can change the structure of the docs, but having the chain_id and task_id as the _id makes our life a lot simpler); we are testing this call since we'd like to have stable performances as the number of docs in the index grows.
We tested a TermsFilter on "user_id" field with 20 random different values, wrapped into a ConstantScoreQuery: we sent to ES 200k requests in chunks of 4k/sec. The result was about 1400 responses/sec.
Then we grew the index doubling the number of docs, reaching 800k (maintaining unique user_ids); the same TermsFilter on "user_id" field with 20 random values, made of 200k reqs with chunks of 4k/sec, and ES kept managing about 1400 resp/sec.
Then we doubled again the number of docs, reaching 1.6M and the same test led to about 1200 resp/sec.
We then decided to grow the index a lot more, and reached 16M documents; we lowered down the test making a total of 6k requests split in chunks of 300/sec (but keeping 20 terms on the filter) and it resulted in ES managing about 200 resp/sec. We made another index with the same number of documents but on 5 shards, and the results were better, about 400 resp/sec.
We noticed that if, for example, the total running time of the test is 20 seconds, during the first 17 secs we receive a few responses (at most some hundreds) and in the remaining 3 secs we receive the missing thousands all together, like a flush of accumulated responses. The higher the number of requests, the more the number of seconds "waiting" before the flush; the flush duration seems proportional to the amount of data requested, so I guess the flush duration is function of the bandwidth between the client and the server (we hit the 12MB/sec limit of a 100Mb network).
How is this behaviour explained? Shouldn't ES be giving back responses as soon as the requests arrive?
We'd like to have stable performances: should we add nodes to the cluster as we reach a certain number of docs in the index and keep doing it as the number of docs grows? Is using more than one shard per node a good idea?