Our system is built on ElasticSearch, and is using the rolling indexing
technique - that is, we have an index per time period. This approach is
also know in this forum as "indexing logs".
We use the percolation feature of ElasticSearch, and are interested in
percolating while indexing. The problem with doing that in this setup is
percolator queries are registered against a specific index, and that's not
necessarily the index we are indexing to.
The immediate solution is to register all queries to all indexes we have in
the system, but that's just ridiculous, as queries are updated and removed
over time.
What we want to have is a pseudo index we can associate all queries with,
and load them from the percolator if an appropriate flag was set in the
percolation / indexing request. A wildcard-based solution could work as
well.
I'm looking at this line:
indexName there should be set to that psuedo-index all queries are
registered against.
There may be better ways to do that - we would appreciate feedback from ES
devs and community. If this is the best way to go, we would be happy to
provide a pull request adding that feature.
Since every index have the same mapping, I would suggest to create an empty
index and to register your percolator queries against this index.
Instead of adding percolate parameter to the index process, you could
simply query your percolator index to check if your document match one or
many queries
But anyways, there is no way to percolate and index in the same time with
rolling indexes.
Le lundi 28 janvier 2013 07:44:22 UTC-5, Itamar Syn-Hershko a écrit :
Here's an interesting problem.
Our system is built on Elasticsearch, and is using the rolling indexing
technique - that is, we have an index per time period. This approach is
also know in this forum as "indexing logs".
We use the percolation feature of Elasticsearch, and are interested in
percolating while indexing. The problem with doing that in this setup is
percolator queries are registered against a specific index, and that's not
necessarily the index we are indexing to.
The immediate solution is to register all queries to all indexes we have
in the system, but that's just ridiculous, as queries are updated and
removed over time.
What we want to have is a pseudo index we can associate all queries with,
and load them from the percolator if an appropriate flag was set in the
percolation / indexing request. A wildcard-based solution could work as
well.
indexName there should be set to that psuedo-index all queries are
registered against.
There may be better ways to do that - we would appreciate feedback from ES
devs and community. If this is the best way to go, we would be happy to
provide a pull request adding that feature.
This is what we do now, but its twice the work. Hence my question about the
best way to make this possible while indexing, with rolling indexes as well.
Since every index have the same mapping, I would suggest to create an
empty index and to register your percolator queries against this index.
Instead of adding percolate parameter to the index process, you could
simply query your percolator index to check if your document match one or
many queries Elasticsearch Platform — Find real-time answers at scale | Elastic
But anyways, there is no way to percolate and index in the same time with
rolling indexes.
Le lundi 28 janvier 2013 07:44:22 UTC-5, Itamar Syn-Hershko a écrit :
Here's an interesting problem.
Our system is built on Elasticsearch, and is using the rolling indexing
technique - that is, we have an index per time period. This approach is
also know in this forum as "indexing logs".
We use the percolation feature of Elasticsearch, and are interested in
percolating while indexing. The problem with doing that in this setup is
percolator queries are registered against a specific index, and that's not
necessarily the index we are indexing to.
The immediate solution is to register all queries to all indexes we have
in the system, but that's just ridiculous, as queries are updated and
removed over time.
What we want to have is a pseudo index we can associate all queries with,
and load them from the percolator if an appropriate flag was set in the
percolation / indexing request. A wildcard-based solution could work as
well.
indexName there should be set to that psuedo-index all queries are
registered against.
There may be better ways to do that - we would appreciate feedback from
ES devs and community. If this is the best way to go, we would be happy to
provide a pull request adding that feature.
I ended up with what I believe to be a simple and elegant solution,
although I might have gotten some of the naming wrong.
Here's a pull request:
What I did was to define a new "system index", which I named _global, and
registering queries against it will ensure they participate in all
percolation operations within the cluster along with queries registered
against the original query.
To make this work you need to define an index named "_global", and to give
it some mapping. Since this is intended to be used in clusters employing
the "rolling indices" pattern, it is safe to assume the mapping of _global
will match the mapping of any other index in the system, and this is what
did.
Comments welcome.
Itamar.
On Tue, Jan 29, 2013 at 6:10 PM, Itamar Syn-Hershko itamar@code972.comwrote:
This is what we do now, but its twice the work. Hence my question about
the best way to make this possible while indexing, with rolling indexes as
well.
Since every index have the same mapping, I would suggest to create an
empty index and to register your percolator queries against this index.
Instead of adding percolate parameter to the index process, you could
simply query your percolator index to check if your document match one or
many queries Elasticsearch Platform — Find real-time answers at scale | Elastic
But anyways, there is no way to percolate and index in the same time with
rolling indexes.
Le lundi 28 janvier 2013 07:44:22 UTC-5, Itamar Syn-Hershko a écrit :
Here's an interesting problem.
Our system is built on Elasticsearch, and is using the rolling indexing
technique - that is, we have an index per time period. This approach is
also know in this forum as "indexing logs".
We use the percolation feature of Elasticsearch, and are interested in
percolating while indexing. The problem with doing that in this setup is
percolator queries are registered against a specific index, and that's not
necessarily the index we are indexing to.
The immediate solution is to register all queries to all indexes we have
in the system, but that's just ridiculous, as queries are updated and
removed over time.
What we want to have is a pseudo index we can associate all queries
with, and load them from the percolator if an appropriate flag was set in
the percolation / indexing request. A wildcard-based solution could work as
well.
indexName there should be set to that psuedo-index all queries are
registered against.
There may be better ways to do that - we would appreciate feedback from
ES devs and community. If this is the best way to go, we would be happy to
provide a pull request adding that feature.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.