Recently we switched our cluster to EC2 and everything is working great... except percolation
To reindex (and percolate) our data we use a separate EC2 c3.8xlarge instance (32 cores, 60GB, 2 x 160 GB SSD) and tell our index to include only this node in allocation.
Because we'll distribute it amongst the rest of the nodes later, we use 10 shards, no replicas (just for indexing).
There are about 22 million documents in the index and 15.000 percolators. The index is a tad smaller than 11GB (and so easily fits into memory).
About 16 php processes talk to the REST API with multi percolate requests with 200 requests in each (we made it smaller because of the performance, it was 1000).
One percolation request (a real one, tapped off of the php processes running) is taking around 2m20s. That would've been ok if one of the resources was utilized completely but that's the strange thing (see stats output here): load, cpu, memory, heap, io; everything is well (very well) within limits. There doesn't seem to be a shortage of resources but still, percolation performance is bad.
When we back off the php processes and try the percolate request again, it comes out at around 15s. Just to be clear: I don't have a problem with a 2min multi percolate request. As long as I know that one of the resources is fully utilized (and I can act upon it).
To rule out network, coordination, etc issues we also did the same request from the node itself (enabling the client) with the same pressure from the php processes with the same result.
Finally, we also upped the
processors configuration and restarted the node to fake our way to a higher usage of resources, to no avail. We tried tweaking the percolate pool size and queue but that also didn't make a bit of difference. I truly hope someone here has an answer to this