I'm fairly new to ES, and wanted to get some guidance about implementing
something similar to what I've done with other systems. I have a set of
queries I use for classifying documents written in a modified version of
the Lucene query syntax. I would like to tag each new document coming into
my system with the classes generated by my queries.
From my experimentation and reading so far, I assume the best pattern to
use for this purpose is to percolate each new doc using my queries, then
tag them outside of ES based on the percolate results, and then to index
based on the tag array.
My questions:
-
In general is this the most efficient pattern to use for classification?
I'm particularly concerned about the round trip of having to use two calls
for each new document.
-
The classification queries can be quite complex and use a mixture of
nested booleans, wildcards, spans, and regexes. The query string query
could cover some of this, but I'll likely have to use a variety of nested
queries to catch things like intensifiers and negations of my terms of
interest. Any suggestions on good tutorials/tools for efficiently (both in
sanity and computation) implementing complex queries in ES?
-
I think the answer is no, but is there a way to write my tags on during
percolation (e.g. via a script)?
-
Any efforts in this direction in terms of plugins? I think the
percolate, tag, index pattern for classification is one that is fairly
popular.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ad71e333-71a2-422b-90a3-59d0ba093a08%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
I came across a good overview talk on percolation from Martijn van
Groningen. https://www.youtube.com/watch?v=ETxJO2FQ_jw
This answered some of my questions, but I'd still like to get some insight
from others trying to do similar things with document classification and
tagging.
On Tuesday, July 8, 2014 3:49:41 PM UTC+1, Peter Passaro wrote:
I'm fairly new to ES, and wanted to get some guidance about implementing
something similar to what I've done with other systems. I have a set of
queries I use for classifying documents written in a modified version of
the Lucene query syntax. I would like to tag each new document coming into
my system with the classes generated by my queries.
From my experimentation and reading so far, I assume the best pattern to
use for this purpose is to percolate each new doc using my queries, then
tag them outside of ES based on the percolate results, and then to index
based on the tag array.
My questions:
-
In general is this the most efficient pattern to use for
classification? I'm particularly concerned about the round trip of having
to use two calls for each new document.
-
The classification queries can be quite complex and use a mixture of
nested booleans, wildcards, spans, and regexes. The query string query
could cover some of this, but I'll likely have to use a variety of nested
queries to catch things like intensifiers and negations of my terms of
interest. Any suggestions on good tutorials/tools for efficiently (both in
sanity and computation) implementing complex queries in ES?
-
I think the answer is no, but is there a way to write my tags on during
percolation (e.g. via a script)?
-
Any efforts in this direction in terms of plugins? I think the
percolate, tag, index pattern for classification is one that is fairly
popular.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e48bf503-f836-4acc-9a63-9c50c5e63c63%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.