Bluk indexing performance over a high latency newtork

We have a 3 node ES cluster set up in AWS.
Running the default Rally benchmark from another instance in the same AWS VPC (using --benchmark-only) results in ~45,000 docs/s indexing rate. Running the same benchmark from the VM in out own data center over a VPN results in ~2,500 docs/s. We get even worse performance using our own benchmark - 20,000docs/s vs 200 docs/s.
The VPN latency is 100ms - that's a lot, but doesn't quite explain the huge performance difference. If we assume that each batch takes 1s to processes, the latency should add only 0.2s to the end-to-end batch processing time.
We are using the default config, so http.keep_alive is true. The network throughput is ~12MB/s (we are not even close using that).

Any advice on how to solve/debug this?

@danielmitterdorfer (for Rally advice)

HI @eugene_miretsky,

one (logical) bulk request is not just one network packet but transferred in HTTP chunked transfer encoding, which means that there are multiple network packets. So I guess your assumption that your latency of 100ms adds just 200ms is not true. I'd just capture the network packets to see what's going on. You can use Wireshark for that.

Daniel

Thanks Daniel!

I'm not exactly a networking expert, but gave Wireshark my best shot. Attached are screen shots of Wireshark over

  1. Good connection (same VPC in AWS)
  2. slow network 100ms latency connection

As you can see the packet size is 10x smaller, and there is an ACK being sent for every packet (as opposed to a batch of packets). Also duplicate ACK rate is 3%. Any idea what's causing this?

From what I understand ES is using HTTP chunking, and it look like in the latter case the chunks are much smaller. Is there a way to tune this?

Hi Eugene,

just as a heads up: I try to look more closely into this but it could take a bit of time until I can spare some cycles.

Daniel

@danielmitterdorfer Sure - any help would be appreciated.

Any idea how to enable HTTP chunking and compression in Rally?

Hi Eugene,

do you have the original packet dumps around? Would be great if you could share them with me.

Compression is not supported out of the box the Python Elasticsearch client (i.e. it's not possible without writing custom code) but I have some code lying around that does that (as I needed it for some HTTP compression benchmark in Elasticsearch).

I see what I can do to integrate that into Rally.

Daniel

Any idea how to enable HTTP chunking and compression in Rally?

If you use the latest master you can specify arbitrary client options (see docs). Otherwise, it will be supported starting with Rally 0.4.0.

For convenience: --client-options="compressed:true,timeout:90,request_timeout:90". (the latter two options are needed as you override the default and I assume you don't want to change these values).

Edit: Expect that query latency will increase with compression (at least it does in my experience). Bulk indexing throughput should be roughly identical.

I tried to support for disabling chunking as hinted by Honza in elasticsearch-py/#422 but it did not work well together with the rest of this feature so it stays unsupported for now. But I am always happy to receive PRs. :wink:

Daniel