Under ES 2 i have an application for maintaining stored percolator documents. The percolator query-field and subfields are indexed to _all. That's fine because percolator documents to edit should be found by metadata as well as used query-terms.
In ES 5.0.0. alpha4 I could not get the terms of the percolator query searchable. The field data type "percolator" does not index to _all and does not accept any other parameters like "copy_to".
I tried also to index my percolator query field as "object" and then "copy_to" a field with type "percolator": Has been rejected.
Ideas to get that working again are appreciated very much. Thanks.
The answer is "maybe", depending on what you need.
It's true that the new percolator data type doesn't index into the _all field, and doesn't support the copy_to parameter either. It does, however, internally extract some terms to use for faster percolator runtime. That extracted field may contain what you need.
If your percolator field is called "query", you can search the extracted terms via the "query.extracted_terms" multi-field.
The field will contain some of the extracted terms, depending on context (e.g. if the query is a bool, it will contain the should clauses. If there is a must clause, it will contain the must clause with the longest terms, etc). So it's not all the terms in the query, and it doesn't contain the actual query names/types either.
If that field doesn't satisfy your needs, currently the only option is to manually duplicate the query contents into a secondary field in your application (e.g. before registering the query).
Perhaps open a ticket requesting support for copy_to? That seems like it would be a sensible feature, if there isn't a technical blocker to keep it from happening.
Thank you. The "query.extracted_terms" did not help in my case; it showed for all my tries 0 hits; May be this is because I use "query_string" queries. But that proved to be an easy way to come from a "significant_terms" aggregation to a percolator.
@mumpiquery_string queries on their own can be extracted, but in the case that ranges, fuzzy or wildcard operators are used then the percolator is unable to extract terms. Is that the case for all your percolator queries? Would be good to know why no terms are extracted. If terms are extracted the percolate query can perform much better.
Regardless of this I think adding copy_to support to the percolator field type makes sense.
I forgot to mention that the extracted_terms does use a special format.
Assuming this query match: { foo: bar }, the query.extracted_terms field holds: foo\0bar
So I think it isn't usable at all for this use case... apologies for the confusion.
The extracted_terms field contains all the query terms from the query_string query, but specially formatted so it is know to field each query term belongs
So for example the query term fdp can be queried via a term query:
However I do doubt if this actually helps you with the setup you had working in ES 2.
So I think the best thing you can do is to tag your documents with terms from the query in a special field. This way you're in full control over how your documents contain percolator queries are retrievable.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.