Coming back to this problem, again...
We upgraded the percolator cluster to 0.19.12 and added one node. In
addition, we upgraded to the latest Sun Java 7 on all nodes. Here's a
bit more detail.
- The calls we make are returning no errors except for an occasional timeout.
- The logs are clear of errors, at least at the default log level, no
idea how, or how much, to increase logging.
- Invariably an index search shows that the filter is in the
_percolator index as it should be.
- When the problem occurs, the percolator starts missing items that
should match that filter.
- After we notice the problem, we do a rolling restart of the cluster
and that clears the problem. NOTE, we do NOT reindex the filters or
make any modification of the filters, we just perform a rolling
restart and that clears it.
We really need to understand how the percolator works, since it's
obvious from it's behavior that the filters in the index are not
necessarily representative of the filters that are being processed.
In simple terms, the percolator itself is getting out of sync with the
contents of the _percolator index and is causing problems.
This out-of-sync bug can happen in all engines at once, or in just a
few of them. We've seen it where all of the percolator nodes have the
filter in the index yet fail to percolate until restarted. We've also
seen it where only a portion of the nodes fail to percolate, yet the
filter is in the _percolator index for all of the nodes. One results
in total loss of data, the other only a partial loss.
We have yet to figure out how to reliably reproduce this, but any help
would be greatly appreciated. It's getting to be a real pain. We
will supply any information you need, even access to the cluster
itself (for the ES team) if that would help.
...Thanks,
...Ken
On Sun, Feb 3, 2013 at 8:46 AM, Kenneth Loafman kenneth@loafman.com wrote:
There are 4 nodes per box, each 4GB, one per processor, in order to
give more processing engines to the task. 4GB memory is sufficient to
keep all of the filters in memory and provide some caching.
Essentially the percolate process is CPU and IO bound, not memory
bound.
We check return statuses from all ES calls, and those show no problems either.
I don't know much about Java logging, so how do you set the log level,
and what log level should I use?
...Thanks,
...Ken
On Fri, Feb 1, 2013 at 4:18 PM, David Pilato david@pilato.fr wrote:
Just a question (I don't have answer for your concern as you did not see anything in logs): do you mean that you host 4 nodes per box? Why don't you start 1 node per box but with 16gb RAM?
For your problem, perhaps you should modify log level to debug to see what´s going on when you update the percolator?
--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
Le 1 févr. 2013 à 15:51, Kenneth Loafman kenneth@loafman.com a écrit :
Hi,
We have two clusters, one for search and the other for percolation,
all under 0.19.4 running on Ubuntu Lucid server.
The elasticsearch.yml is here: Percolator Cluster · GitHub
There are 24 4GB nodes spread across 6 machines, all behind a load
balancer. On a fairly regular basis the nodes get out of step with
each other, sometimes losing entire filter sets in some nodes, while
maintaining the entire set in another nodes. This causes loss of data
since we don't catch the data that matches if it hits the wrong node.
I've looked at the logs and cannot see any indication of problems.
Each filter set is a set of filters and exclusions, named uniquely.
Percolation matches tell us the index(es) where the data is targeted.
When a filter set is changed, perhaps multiple times per day, the
process is simple, delete all the old filters and add in the new ones
(a very small subset of the total data). I was suspicious that the
delete followed by add was somehow being applied in the wrong order so
I added a flush/refresh after the delete step and after the add step.
We are still encountering the problem. Any ideas?
...Thanks,
...Ken
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.