To get acquainted with Elasticsearch and test it's write throughput I've set up a simple PHP script that inserts a JSON with 3 fields, like so:
private function registerName($name) {
$ip = $_SERVER['REMOTE_ADDR'];
$data = [
"name" => $name,
"ip" => $ip,
"registerTime" => time()
];
$db = $this->dbConnect();
// Insert X
$params = [
'index' => 'users',
'type' => 'json',
'body' => $data
];
$response = $db->index($params);
return $response["_id"];;
}
function dbConnect() {
return Elasticsearch\ClientBuilder::create()->setHosts(["172.31.41.71"])->build();
}
unfortunately I'm merely getting a throughput of 5.5k docs/s through this script. I'm running the script from a different server than the Elasticsearch server, because when I had them on the same machine I was getting only 3.3k/sec (makes sense.) The script is run with a concurrency of 800.
The two machines (Elasticsearch & PHP-client) are both c5.2xlarge instances at Amazon AWS. This is a standard Elasticsearch install with no settings altered other than the IP adress it binds to. I upped the disk of the ES instance to 1TB, which gives me 3000IOPS. I benched it at 148 MB/s (megabyte) using 'dd'. While firing the PHP script I can see the disk I/O is around 10MB/s, sometimes touching 30MB/s and then quickly dropping back down. The PHP client doesn't appear to be the bottleneck considering adding another one doesn't increase my throughput.
I was expecting roughly 65k docs/s so this was rather disappointing. Interestingly, benching the instance using Rally does give me 66k/s on the 'index-append' test but I'm not sure if that test is comparable to my use-case. I can also see the instance disk is running at ~100MB/s during this test, so a lot more than the ~10MB/s I'm getting with my own test. I'm guessing the Rally test uses one (or a minimal # of) connection(s), and is bulking them as much as possible.
Can someone tell me if ES fits my use-case (many individual clients/connections, short-lived connection, 1 insert each) and if so, what I need to do to reach the desired throughput? I was hoping to reach ~200k/sec after sharding on 3 instances.
Perhaps interesting, here's a HTOP snapshot during the test with the PHP client:
And here's the output from wrk (the tool used to call the PHP script over HTTP)
Running 1m test @ http://172.31.44.222/registerRandom
4 threads and 800 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 166.97ms 153.43ms 1.93s 86.04%
Req/Sec 1.41k 187.60 2.40k 72.17%
336983 requests in 1.00m, 84.46MB read
Socket errors: connect 0, read 0, write 0, timeout 55
Requests/sec: 5610.74
Transfer/sec: 1.41MB
please note that I realise this is a sub-optimal approach. I'm using it because it's fairly realistic for our use-case (it's mimicking 'real visitors').
Full Rally results: https://pastebin.com/xpJjmkuT