Slowness in percolator 7.2

We are migrating from 1.7 to 7.2 ES

Our use case of percolator is pretty simple. We have around 120K percolators spread on a 5 node cluster. Percolator docs have meta fields that lets us filter out unnecessary queries that are surely not going to match [1.7 had an explicit support for filtering in the percolator API, In 7.2 we have to make use of bool query]

The index has 48 shards. We have allocated 8GB of heap giving the Young Gen 6GB [we saw the usage of Old gen is our previous cluster to be zero]. The machine has 15GB of RAM. We use the G1GC collector [default] and haven't tweaked anything [we made use of the ParNewGC for 1.7]. Number of GC that happen per minute is ~40 and take around 200ms. We do see a XX:MaxGCPauseMillis setting in the JVM which is defaulted to 200ms. Anything to do with this? I've also heard that G1GC doesn't work well with smaller heaps [it's good with >10GB]

Anyways,

What I want to know is that how does ES optimise queries on its end? And how much does ordering of filters matter? [we did benchmarking with filter ordering and did see a bit of benefit but not much really at peak time]. This is how we do percolation

{
  "query": {
    "bool": {
      "filter": [
         { "term": { "domain": <> } },
         { "percolate": { } }
     ]
   }
 }
} 

We keep the domain filtering before in the hopes that ES will filter out unnecessary percolation

One other significant change we've done in the ES queries was the below

{
   "query":  {
          "bool": {
              "filter": [
                 { "term": { ... } },
                 { "term": { ... } },
                 { "term": { ... } }
              ]
           }
     }
} 

now looks like

{
   "query":  {
          "bool": {
              "filter": [
                 { "bool { "filter": { "term": { ... } } } },
                 { "bool { "filter": { "term": { ... } } } },
                 { "bool { "filter": { "term": { ... } } } }
              ]
           }
     }
} 

This just simplified ES query creation from our home grown logic spec. But does having unnecessary bools affect the way queries are executed by Lucene? I heard ES has done some sort of intermediate representation of queries, so I'm guessing it optimises certain things? What exactly would they be? Most of our queries are either term queries or match queries on keywords/text/numbers/booleans/dates

Other common settings like memorylock are all in place. This cluster is exclusively for percolation only so no other traffic. Our traffic is around 200 QPS

I'm afraid I can't answer your questions directly, but I've run some benchmark tests on a dedicated percolator cluster. This was a while back (ES 5.6.4) but the observations I made may be of use to you.

  • In my tests, the more CPU cores each node had available the more QPS the cluster could serve.
  • The number of shards had an optimal value that was related to the number of CPU cores, any more or fewer shards would lower the QPS.
  • If you don't need the match score of each document, using a constant_score filter may improve the QPS (see Percolating in a filter context).
  • I also tried percolating with a range of Java Heap Sizes but didn't really get a good conclusion, possibly because the GC played a more significant role - affecting the QPS values.

I hope this may be of help in your case. Good luck!

The percolator change a lot between those versions. Since elasticsearch 5.0 the percolator no longer keep all the queries in memory, but loads those on demand from the index they are stored in. To improve the performance an automatic extraction of conditions to pre filter percolate queries was added. You should check if this extraction failed for your queries as decribed here https://www.elastic.co/guide/en/elasticsearch/reference/7.3/query-dsl-percolate-query.html#how-it-works.

Ya at first I thought that too. But there were only 3000 percolators on whom the extraction failed. that shouldn't affect it a lot

I was about to ask what are the performance implications of the queries no longer in memory? Some doc also mentioned percolator queries are not cached. What about that? I guess this and that is the same thing @Daniel_Penning

You should run a query with profiling enabled. This should show how elasticsearch executes your query.

This article is also relevant to understand how percolator queries are executed and how they interact with other queries.

The extracted terms are matched against the document given in the percolate query in the first query phase which handles conjunctions with other term queries very well. This gives an approximations of which documents could match the query. For all those candidate the percolator has to load and parse the query that was stored in the index. If the approximation phase finds many possible matches elasticsearch would have to create lots of short-lived object which would explain why so many garbage collection are needed. This should show in the profile as high nextDoc/advance counts and low match counts.

Have you also tried smaller Young Gen sizes for G1 or ParNewGc?

@Daniel_Penning we've found the root cause of our problem

We can't use ParNewGC because we are on JDK12. We are experimenting using G1GC and ParallelOldGC. If the objects live for short time, why would I want to have a large useless old gen space? I'd give that space to the young gen instead right? Am I missing anything?

Also other thing, the ES doc says the mini index for percolation is optimized for percolation purpose. My mappings have doc_values as true [default], I'm guessing ES doesn't spend time on extracting doc_values for that single document that gets indexed in the mini index [for percolation]