How to store percolate queries and documents

Hi, I'm trying to understand how and where store documents and percolate queries for these documents.

According to this documentation it's possible to create a new field in the index mapping with percolator data type. We also should have defined mapping in the index for the fields we're using in the percolate query.

Since ES 6.x removes the concept of types, one option is to store documents and queries inside a single index. Sounds good, but have a couple of questions here:

  1. When we will store queries, we will store single query field, and all other fields will be empty, is it true and is it okay?

  2. How generate document ids? For example, if we store a couple of documents with ids = 1, 2, 3, ... etc. what ids we will use for percolate queries?

Another option is to store percolate queries in the different index, but in this case we should also place the same mapping as we use in the index for our documents, and it may be not easy to maintain these mappings in both indices: for documents and percolate queries.

What's the best? Is there some other way?

Thanks for your answers in advance!

No, this isn't possible since the percolator index must be of type "percolator". Store percolator queries and documents in separate indices.

Yes, you can and should use the same mapping for both, see this forum thread for how to do this.

@psylone I'm the one who asked the question linked. I did implement it in two different indices, where both have similar mappings. The solution works great.

It's a bit of a pain to maintain almost the same mapping in two different indexes, but I guess that's a consequence of the new one type per index approach. I believe the benefits of this are superior to the downside of maintaining a mapping in two places.

I'm surprised to hear that. I use the exact same json-file to create the mapping for all the document indices as for the percolator indices.

The only thing extra about the percolator indices is that they need "type": "percolator" and I do that through a separate json-file where I define an order: 1 mapping with index_patterns matching only the names of my percolator indices. With this setup I only maintain one mapping, used for both documents and percolators, and one minor percolator settings file.

Probably, the point here is when you define your mapping inside the application with some Elasticsearch client library. In this case if you modify your mapping, adding a new field, for example, and plan to use it in percolate queries, you also should add the same field in the percolate queries index.

Would love to see examples of your document mapping and your percolator mapping.

That is probably true, but then you have the added complexity of application maintenance since you may have multiple applications using the same index.

I prefer to extract the mapping responsibility out of the apps and use the mapping like a DTD that the apps must obey in order to use the common indices.

It's really no magic :slight_smile:

Let's assume I store my documents in an index named "myorg_docs" while percolators are stored in "myorg_perc". Then I create a general index mapping, defining all the allowed document fields and values, using "index_patterns" : "myorg*" to match the names of both types of indices. By using "order": 0 this general mapping will be applied first.

Then, since I need to define the percolator type, I create a small mapping of order 1, so that it gets applied after the order 0 mapping, and using "index_patterns": "myorg_perc*" so that it only gets applied to percolator indices, not the document indices. This mapping will typically look something like this:

{
 "order" : 1,
 "index_patterns" : "myorg_perc*",
 "mappings" : {
       "myorg": {
            "properties": {
                "query": {
                    "type": "percolator"
                }
            }
        }
  }
}

And that's all there is to it - one large mapping with all the document fields and one tiny matching only the percolator indices.

Naturally, you can split this up over several orders (I have up to 4 levels in some clusters) where each level of mapping becomes more specialized, overriding settings in the order 0 or order 1 mappings, allowing me to tweak certain indices while leaving most with the default mapping.

That's awesome. Thanks!

1 Like

Yeah, I thought the same way, but from the documentation:

PUT /my-index
{
    "mappings": {
        "_doc": {
            "properties": {
                "message": {
                    "type": "text"
                },
                "query": {
                    "type": "percolator"
                }
            }
        }
    }
}

and the same index is using to percolate with the stored document here:

PUT /my-index/_doc/2
{
  "message" : "A new bonsai tree in the office"
}

GET /my-index/_search
{
    "query" : {
        "percolate" : {
            "field": "query",
            "index" : "my-index",
            "type" : "_doc",
            "id" : "2",
            "version" : 1 
        }
    }
}

Probably, it's just an example, but since percolator data mapping type was introduced, it's possible to use a single index both for documents and percolate queries, checked it right now: es-6-percolate-api.sh · GitHub

What's more interesting, in the documentation for 5.6 version for the same topic there's a separate type instead of field:

PUT /my-index
{
    "mappings": {
        "doctype": {
            "properties": {
                "message": {
                    "type": "text"
                }
            }
        },
        "queries": {
            "properties": {
                "query": {
                    "type": "percolator"
                }
            }
        }
    }
}

All this stuff in the docs a little bit confusing me that's why the topic question was born.

@Bernt_Rostad, @Thomas_Ardal what do you think?

It does look like you can index documents as well as percolator queries in the same index, if we can trust these examples (but this goes against what I thought I knew of percolator indices). I've never done that and there are good reasons not to.

  • For scaling reasons you may want several percolator indices in the cluster, then you would have to index the same document in all of them in order to percolate it against all percolators. Which means a lot of extra I/O for the nodes in the cluster.
  • If your document volume is large, the documents will swamp out the relatively few percolator queries stored in the index which could slow down both percolation and query (for obvious reasons a large index will be slower to query or percolate against than a small one).

I try to keep my percolator indices small, with many primary shards that I can spread across a number of dedicated percolator nodes to spread the percolator workload.

However, an alternative to storing the documents in a separate index is to just percolate them directly, as they arrive in your system, to get the matches and then index them in a suitable result index or just pass on the results to the next service.

1 Like

I totally agree, just curious about the doc examples, so commented here: Docs: percolate refers to multiple types per index · Issue #34056 · elastic/elasticsearch · GitHub

Thanks!

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.