Could not get stored percolator fields searchable (ES 5.0.0.alpha4)

Under ES 2 i have an application for maintaining stored percolator documents. The percolator query-field and subfields are indexed to _all. That's fine because percolator documents to edit should be found by metadata as well as used query-terms.

In ES 5.0.0. alpha4 I could not get the terms of the percolator query searchable. The field data type "percolator" does not index to _all and does not accept any other parameters like "copy_to".

I tried also to index my percolator query field as "object" and then "copy_to" a field with type "percolator": Has been rejected.

Ideas to get that working again are appreciated very much. Thanks.

The answer is "maybe", depending on what you need.

It's true that the new percolator data type doesn't index into the _all field, and doesn't support the copy_to parameter either. It does, however, internally extract some terms to use for faster percolator runtime. That extracted field may contain what you need.

If your percolator field is called "query", you can search the extracted terms via the "query.extracted_terms" multi-field.

The field will contain some of the extracted terms, depending on context (e.g. if the query is a bool, it will contain the should clauses. If there is a must clause, it will contain the must clause with the longest terms, etc). So it's not all the terms in the query, and it doesn't contain the actual query names/types either.

If that field doesn't satisfy your needs, currently the only option is to manually duplicate the query contents into a secondary field in your application (e.g. before registering the query).

Perhaps open a ticket requesting support for copy_to? That seems like it would be a sensible feature, if there isn't a technical blocker to keep it from happening.

Thank you. The "query.extracted_terms" did not help in my case; it showed for all my tries 0 hits; May be this is because I use "query_string" queries. But that proved to be an easy way to come from a "significant_terms" aggregation to a percolator.

I try to open a feature request.

@mumpi query_string queries on their own can be extracted, but in the case that ranges, fuzzy or wildcard operators are used then the percolator is unable to extract terms. Is that the case for all your percolator queries? Would be good to know why no terms are extracted. If terms are extracted the percolate query can perform much better.

Regardless of this I think adding copy_to support to the percolator field type makes sense.

I forgot to mention that the extracted_terms does use a special format.
Assuming this query match: { foo: bar }, the query.extracted_terms field holds: foo\0bar

So I think it isn't usable at all for this use case... apologies for the confusion.

See below one of my percolators. How can I make visible extracted terms?

{
"_index": "smdperc_de",
"_type": "percolator",
"_id": "category:Politik|Staatsfinanzen, Steuern|Staatsfinanzen_Steuern_dt",
"_score": 0.006514681,
"_source": {
"titel": "Politik | Staatsfinanzen, Steuern | Staatsfinanzen_Steuern_dt",
"taxonomy": "category",
"quelle": "recommind",
"bearbeitungsPrio": "1",
"bearbeitungsStatus": "imported",
"aktiv": false,
"la": "de",
"query": {
"query_string": {
"default_field": "mainqueryRoot",
"default_operator": "OR",
"query": "kanton steuer finanzausgleich bund finanzminister franken franke frank finanzdirektor steuern steuereinnahme steuersenkung steuersatz besteuern besteuerung steuerpflichtig ausgeben steuerzahler million besteuert steuerpflichtige steuerlich kantonal einkommen steuerverwaltung steuergesetz voranschlag finanzpolitisch budget steuerwettbewerb steuerfuss fdp milliarde steuererhoehung einnahme nfa kuerzung sp parlament entlastung svp steuerbelastung jaehrlich steuerausfall erhoehung defizit mehrwertsteuer beitrag gemeinde geld steuersystem regierung betragen subvention finanzierung betrag ausgabe bundesrat senken entlasten fiskus mehreinnahme prozent kosen bundessteuer budgetieren regierungsrat finanziell budgetiert einkommenssteuer cvp einnehmen vorschlag kosten merzen zahlen einsparung vorlage finanzplan finanzieren sparmassnahme zusaetzlich investition sparen 2007 senke finanzpolitik steuerreform 2008 erhoehen haushalt degressiv bundeshaushalt rat zahl rechnung staatshaushalt steuergesetzrevision reform finanz",
"minimum_should_match": "10%"
}
}
}

The extracted_terms field contains all the query terms from the query_string query, but specially formatted so it is know to field each query term belongs

So for example the query term fdp can be queried via a term query:

{
  "term" :  {
      "query.mainqueryRoot" : "mainqueryRoot\0fdp"
  }
}

However I do doubt if this actually helps you with the setup you had working in ES 2.

So I think the best thing you can do is to tag your documents with terms from the query in a special field. This way you're in full control over how your documents contain percolator queries are retrievable.

I give up, store the query in an ordinary metadata field and change my editor to copy the query also to the percolator field.

Thank you for your effort.