Hi,
we're dealing with a performance issue with elastic 1.5.2 (but we also had it on 1.5.1 and preceding versions).
Our ES is formed by a single node instance on a Xeon with 12 HT cores and 64Gb of RAM; we plan to setup an ES cluster later this year. We configured both $ES_MIN_MEM and $ES_MAX_MEM with 32g. We also made these changes to the default configuration file:
threadpool.search.type: fixed
threadpool.search.size: 24
threadpool.search.queue_size: 200000
bootstrap.mlockall: true
http.cors.enabled: true
script.disable_dynamic: false
We started our tests on an index with one shard, no replicas and only one mapping with 400k docs (each with 10 subdocs) like this one:
{"user_id":201512,"chain_id":7735,"task_id":8856,"time_slices":[{"slice_end_time":4603,"slice_start_time":4505,"slice_time_effort":10},{"slice_end_time":6942,"slice_start_time":8835,"slice_time_effort":10},{"slice_end_time":8679,"slice_start_time":6006,"slice_time_effort":10},{"slice_end_time":7701,"slice_start_time":2305,"slice_time_effort":10},{"slice_end_time":5978,"slice_start_time":1504,"slice_time_effort":10},{"slice_end_time":8576,"slice_start_time":2713,"slice_time_effort":10},{"slice_end_time":6037,"slice_start_time":1160,"slice_time_effort":10},{"slice_end_time":8141,"slice_start_time":8894,"slice_time_effort":10},{"slice_end_time":6278,"slice_start_time":7509,"slice_time_effort":10},{"slice_end_time":3711,"slice_start_time":350,"slice_time_effort":10}]}
indexed with the _id as a concatenation of task_id and chain_id; each of the 400k docs has a different "user_id" value.
Our target is to retrieve documents related to a set of users.
We cannot use a MultiGet to retrieve the docs related to a set of users (since the _id of the docs is not the user_id) and so we use a TermsFilter (we of course can change the structure of the docs, but having the chain_id and task_id as the _id makes our life a lot simpler); we are testing this call since we'd like to have stable performances as the number of docs in the index grows.
We tested a TermsFilter on "user_id" field with 20 random different values, wrapped into a ConstantScoreQuery: we sent to ES 200k requests in chunks of 4k/sec. The result was about 1400 responses/sec.
Then we grew the index doubling the number of docs, reaching 800k (maintaining unique user_ids); the same TermsFilter on "user_id" field with 20 random values, made of 200k reqs with chunks of 4k/sec, and ES kept managing about 1400 resp/sec.
Then we doubled again the number of docs, reaching 1.6M and the same test led to about 1200 resp/sec.
We then decided to grow the index a lot more, and reached 16M documents; we lowered down the test making a total of 6k requests split in chunks of 300/sec (but keeping 20 terms on the filter) and it resulted in ES managing about 200 resp/sec. We made another index with the same number of documents but on 5 shards, and the results were better, about 400 resp/sec.
We noticed that if, for example, the total running time of the test is 20 seconds, during the first 17 secs we receive a few responses (at most some hundreds) and in the remaining 3 secs we receive the missing thousands all together, like a flush of accumulated responses. The higher the number of requests, the more the number of seconds "waiting" before the flush; the flush duration seems proportional to the amount of data requested, so I guess the flush duration is function of the bandwidth between the client and the server (we hit the 12MB/sec limit of a 100Mb network).
How is this behaviour explained? Shouldn't ES be giving back responses as soon as the requests arrive?
We'd like to have stable performances: should we add nodes to the cluster as we reach a certain number of docs in the index and keep doing it as the number of docs grows? Is using more than one shard per node a good idea?
Thanks,
Andrea