Disappointing insert throughput (non-bulk)

jaki_tori · July 10, 2018, 2:26pm

To get acquainted with Elasticsearch and test it's write throughput I've set up a simple PHP script that inserts a JSON with 3 fields, like so:

private function registerName($name) {

	$ip = $_SERVER['REMOTE_ADDR'];
	$data = [
		"name" => $name,
		"ip" => $ip,
		"registerTime" => time()
	];
	
	$db = $this->dbConnect();
	
	// Insert X
	$params = [
		'index' => 'users',
		'type' => 'json',
		'body' => $data
	];
	$response = $db->index($params);

	return $response["_id"];;
}

function dbConnect() {
	return Elasticsearch\ClientBuilder::create()->setHosts(["172.31.41.71"])->build();
}

unfortunately I'm merely getting a throughput of 5.5k docs/s through this script. I'm running the script from a different server than the Elasticsearch server, because when I had them on the same machine I was getting only 3.3k/sec (makes sense.) The script is run with a concurrency of 800.

The two machines (Elasticsearch & PHP-client) are both c5.2xlarge instances at Amazon AWS. This is a standard Elasticsearch install with no settings altered other than the IP adress it binds to. I upped the disk of the ES instance to 1TB, which gives me 3000IOPS. I benched it at 148 MB/s (megabyte) using 'dd'. While firing the PHP script I can see the disk I/O is around 10MB/s, sometimes touching 30MB/s and then quickly dropping back down. The PHP client doesn't appear to be the bottleneck considering adding another one doesn't increase my throughput.

I was expecting roughly 65k docs/s so this was rather disappointing. Interestingly, benching the instance using Rally does give me 66k/s on the 'index-append' test but I'm not sure if that test is comparable to my use-case. I can also see the instance disk is running at ~100MB/s during this test, so a lot more than the ~10MB/s I'm getting with my own test. I'm guessing the Rally test uses one (or a minimal # of) connection(s), and is bulking them as much as possible.

Can someone tell me if ES fits my use-case (many individual clients/connections, short-lived connection, 1 insert each) and if so, what I need to do to reach the desired throughput? I was hoping to reach ~200k/sec after sharding on 3 instances.

Perhaps interesting, here's a HTOP snapshot during the test with the PHP client:

And here's the output from wrk (the tool used to call the PHP script over HTTP)

Running 1m test @ http://172.31.44.222/registerRandom
  4 threads and 800 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   166.97ms  153.43ms   1.93s    86.04%
    Req/Sec     1.41k   187.60     2.40k    72.17%
  336983 requests in 1.00m, 84.46MB read
  Socket errors: connect 0, read 0, write 0, timeout 55
Requests/sec:   5610.74
Transfer/sec:      1.41MB

please note that I realise this is a sub-optimal approach. I'm using it because it's fairly realistic for our use-case (it's mimicking 'real visitors').

Full Rally results: https://pastebin.com/xpJjmkuT

Christian_Dahlqvist · July 10, 2018, 2:36pm

Bulk indexing will, as described in the documentation, give much better throughput than indexing individual documents, so what you are seeing is expected. Rally uses bulk requests, which explains the difference in performance. Why are you not using bulk requests when indexing?

jaki_tori · July 10, 2018, 2:46pm

Thanks for your answer. The reason is our use case; user registration, using PHP as server-side language. 1 user = 1 PHP process = 1 connection. And the insert may not be delayed, since the user is waiting for their user-id so they can continue with our service. No possibility for bulking there. I guess that means ES is not a good fit for this use-case. Which is fine off course

system · August 7, 2018, 2:46pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Bulk Insert Throughput Issues Elasticsearch	2	314	July 6, 2017
Bulk insertion is too slow using php Elasticsearch	7	1287	December 20, 2017
Indexing (insert) performance and tuning Elasticsearch	6	1318	July 6, 2017
Problem with insert data to elasticsearch Elasticsearch	7	1845	July 6, 2017
ES write performance Elasticsearch	34	3186	July 6, 2017

Disappointing insert throughput (non-bulk)

Related topics