We currently have a cluster with 50 millions of docs using ElasticSearch
version 1.3.2
We were looking for something like a persisted filter, and the filtered
aliases, added in version 1.4.0.beta1, seems perfect for it
Our infrastructure team is not happy to upgrading it in production without
doing a lot of tests before, so we have to do a lot of tests and upgrade
later
We are looking for some advices in what can go wrong with this upgrade,
what are the risks?
And also, is there a way to implement a "persistent filter" in our current
version? I mean, some of our users will have access to a part of our data,
we need something like a database view. We could send a filter in every
request, but that would be too slow with, let's say, 10 millions of ids.
1.4 changed a lot of things, especially at the distributed system level, so
testing it in your staging environment will certainly help ensure that
things work as expected.
Filtered aliases have been available for a long time (even before
1.4.0.beta1), it's very likely that they are already available with the
current version that you are running. However, a filter containing 10
million of ids will be slow anyway, even if you cache them the first
execution on a new segment might cause latency spikes since there are lots
of postings lists that need to be merged. Would it be possible to change it
to a simpler term filter, eg. by adding more metadata to your documents?
On Mon, Dec 1, 2014 at 9:23 PM, Roger de Cordova Farias <
roger.farias@fontec.inf.br> wrote:
Hello
We currently have a cluster with 50 millions of docs using Elasticsearch
version 1.3.2
We were looking for something like a persisted filter, and the filtered
aliases, added in version 1.4.0.beta1, seems perfect for it
Our infrastructure team is not happy to upgrading it in production without
doing a lot of tests before, so we have to do a lot of tests and upgrade
later
We are looking for some advices in what can go wrong with this upgrade,
what are the risks?
And also, is there a way to implement a "persistent filter" in our current
version? I mean, some of our users will have access to a part of our data,
we need something like a database view. We could send a filter in every
request, but that would be too slow with, let's say, 10 millions of ids.
Looks like I read it wrong in the documentation, only the "Fields referred
to in alias filters must exist in the mappings of the index/indices pointed
to by the alias." part was included in the 1.4.0.beta1
What we are doing is that the user, after doing a search, can "bookmark"
the results. Then he has the possibility to do new searchs on his
bookmarked docs only.
Adding metadata to our docs with information of who bookmarked them would
work, too... it only will be harder to update, because the user can
bookmark/un-bookmark them on the flow and in batches (like "bookmark all
docs of the search result")
I will study the approaches to see wich one fits better for us
1.4 changed a lot of things, especially at the distributed system level,
so testing it in your staging environment will certainly help ensure that
things work as expected.
Filtered aliases have been available for a long time (even before
1.4.0.beta1), it's very likely that they are already available with the
current version that you are running. However, a filter containing 10
million of ids will be slow anyway, even if you cache them the first
execution on a new segment might cause latency spikes since there are lots
of postings lists that need to be merged. Would it be possible to change it
to a simpler term filter, eg. by adding more metadata to your documents?
On Mon, Dec 1, 2014 at 9:23 PM, Roger de Cordova Farias <
roger.farias@fontec.inf.br> wrote:
Hello
We currently have a cluster with 50 millions of docs using Elasticsearch
version 1.3.2
We were looking for something like a persisted filter, and the filtered
aliases, added in version 1.4.0.beta1, seems perfect for it
Our infrastructure team is not happy to upgrading it in production
without doing a lot of tests before, so we have to do a lot of tests and
upgrade later
We are looking for some advices in what can go wrong with this upgrade,
what are the risks?
And also, is there a way to implement a "persistent filter" in our
current version? I mean, some of our users will have access to a part of
our data, we need something like a database view. We could send a filter in
every request, but that would be too slow with, let's say, 10 millions of
ids.
I upgraded our logging cluster to 1.4 without any problems.
When I looked into upgrading a separate dev/test instance used for a
different purpose I ran into problems with the plugins. If you are using
plugins, make sure they are supported in 1.4.
I upgraded our logging cluster to 1.4 without any problems.
When I looked into upgrading a separate dev/test instance used for a
different purpose I ran into problems with the plugins. If you are using
plugins, make sure they are supported in 1.4.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.