How to store percolate queries and documents

It does look like you can index documents as well as percolator queries in the same index, if we can trust these examples (but this goes against what I thought I knew of percolator indices). I've never done that and there are good reasons not to.

  • For scaling reasons you may want several percolator indices in the cluster, then you would have to index the same document in all of them in order to percolate it against all percolators. Which means a lot of extra I/O for the nodes in the cluster.
  • If your document volume is large, the documents will swamp out the relatively few percolator queries stored in the index which could slow down both percolation and query (for obvious reasons a large index will be slower to query or percolate against than a small one).

I try to keep my percolator indices small, with many primary shards that I can spread across a number of dedicated percolator nodes to spread the percolator workload.

However, an alternative to storing the documents in a separate index is to just percolate them directly, as they arrive in your system, to get the matches and then index them in a suitable result index or just pass on the results to the next service.

1 Like