Classification Pattern: Percolate, Tag, Index


(Peter Passaro) #1

I'm fairly new to Elasticsearch and I'm looking for suggestions on the best
pattern to execute something similar to what I've done with other systems.
I have a set of fairly complex queries (for about 10 categories) based on a
slightly modified version of the Lucene query language. For each new
document coming into my system I want to match against those queries, then
tag any matching documents appropriately.

So far, it looks like the pattern I should be using is to percolate docs
with the classification queries, insert a classification tag array on docs
outside of ES, and then index them based on the tags.

My questions:

  1. Is this the most efficient usage pattern for classification in general?
    I'm particularly concerned about the round trip of having to come out of ES
    to write tags on to docs.

  2. The most straightforward choice for implementing my existing queries is
    the Query String Query, but I'll likely need to combine a couple different
    query types to get the same functionality as my existing rules. These
    include a complex set of nested booleans and spans (to catch things like
    negations and intensifiers of my terms of interest), wildcards, and
    regexes. Any good sources of advice/tools out there on efficiently building
    and executing complex queries such as these in ES?

  3. I think the answer is no, but is there any way to tag a document (e.g.
    via a script) during percolation?

  4. Is there any way to reduce these to a single call? (percolate, tag, and
    index in one go) Thinking about a plugin here if one doesn't already exist.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a9910e36-ea7d-49a5-a14f-69c154cc8855%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #2