Percolate performance improvements?

We have been experiencing what feels like rather poor performance for our percolate queries. We have 4 indexes with percolate queries, each with a few hundred saved queries. These indexes have a single shard and many replicas (to attempt to spread out the load from the percolate queries). Our cluster is running ES 6.8 and has 27 data nodes each 16 CPUs, 52gb RAM, 2.8TB NVME storage. When running percolate queries, the highest throughput we have been able to achieve is 1k to 2k documents percolated per second. For perspective, we regularly easily index over 10k documents a second (the limits are generally from other processing systems, not the ES cluster), and for searching we have seen some of our indexes peak at over 100k searches a second. In general, outside of percolate, the cluster generally has acceptable performance.

I was wondering what the best practices were when structuring indexes for maximum percolate performance. Should creating a replica for each node scale performance linearly? Should we dedicate some nodes to just holding the percolate indexes? We have some dynamic templates for defining the mapping of of the documents to be percolated, should we specify those in the mapping of the index rather than using the dynamic template? What are the factors that generally slow down percolate performance?

Thank you for any help or guidance.

I'm not sure if i can give a good advice in your particular case as my percolator experience is based on a very different use case, in my case we have one large percolate index of 10+ thousand queries and we typically percolate 10-50 documents at a time (but from multiple clients).

I did a series of benchmark tests a couple of years ago and found a few general guidelines:

  • The more CPU cores available, the better
  • The more nodes, the better
  • Up to a point, the more primary shards the better
  • Surprisingly replica shards did not improve performance, quite the contrary

As an example, I tested on a 4-node cluster where each node had 32 CPU cores and 30 GB Java Heap Space available. The percolate index had 8 primary shards (two per node) and contained 10 thousand queries, some of them fairly complex with many wildcards terms. I ran a test with 5000 documents in bulks of 50 and repeated the test five times. The result was an average DPS (docs per second) of 23 which we could live with.

Interestingly, when I added 1 replica shard (making the cluster contain 8 primary and 8 replica shards) the average DPS dropped from 23 to 15. This happened for all my setups (I ran tests with 2, 4, 8 and 16 primary shards), so I was forced to conclude that replicas reduce performance. By the way, these tests were run on Elasticsearch 5.6 so things may be a bit different with ES6 and now ES7.

I never ran tests with just a few hundred queries in the percolate index so I don't know if your numbers are good or can be improved. But you could try without replica shards while increasing the number of primaries to see if that affects your throughput.

Another thing worth mentioning, if you don't already do this, is to use constant_score filter to speed up percolate queries as mentioned in the Percolating in a filter context documentation.

Good luck!

Fascinating! Thank you for the hint. I never would have expected that. I will have to run some more tests. I assumed that because the index had so few docs that it wasn't worth even thinking about the number of primary shards and assumed that replicas would do most of the work of helping spread the search load (just like in "regular" queries).

It is really interesting all the different ways people use these features. As you stated, our use-cases are quite literally the opposite. We index hundreds of millions of docs against a relatively small number of queries.

During our first benchmarks with the ES 5.X releases we had truly abysmal performance in the low hundreds of DPS. So instead we wrote application logic to regularly search the most recently indexed documents against our saved queries (sort of a manual, batch, percolate system). The performance was reasonable, but was not something we wanted to continue to maintain and performance started to decrease as well.

Thank you. I remember reading that and believe that we wrap all of our queries in a bool filter, but I'll double check.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.