Custom StatisticalFacet not working with multiple nodes

(Kevin-2) #1

I developed a quick addition to the StatisticalFacet to also get the median
and quartiles of the data we analyze. This was accomplished by keeping
track of all the data we process, and sorting it at the end. This was
developed when one node was good enough for us, but we've been working on
moving to a cluster for stability issues. However, this has introduced an
issue in our custom facet. Our plugin is no longer consistent, and tends to
return results that are random, but not absurd. For example, when getting
the quartiles, all of the numbers are in the right order, and in the rough
range of where they should be, but aren't consistent.

My best guess is an issue related to how elasticsearch aggregates the data,
causing a situation we aren't accounting for, where we ignore some of the
data. Is this method of obtaining quartiles possible with elasticsearch? If
not, is there another way to get this data? Below is the reduce method of
the facet processor.

public Facet reduce(String name, List<Facet> facets) {
   if (facets.size() == 1) {
       return facets.get(0);
   double min = Double.NaN;
   double max = Double.NaN;
   double total = 0;
   double sumOfSquares = 0;
   long count = 0;
   double quartile25 = Double.NaN;
   double median = Double.NaN;
   double quartile75 = Double.NaN;
   ArrayList<Double> data = new ArrayList<Double>();

    for (Facet facet : facets) {
       if (! {
       CustomFacet statsFacet = (CustomFacet) facet;
       if (statsFacet.min() < min || Double.isNaN(min)) {
           min = statsFacet.min();
       if (statsFacet.max() > max || Double.isNaN(max)) {
           max = statsFacet.max();

        total +=;
       sumOfSquares += statsFacet.sumOfSquares();
       count += statsFacet.count();


       //count = data.size();
       quartile25 = data.get((data.size() / 4));
       median = data.get((2 * data.size() / 4));
       quartile75 = data.get((3 * data.size() / 4));

    return new CustomFacet(name, min, max, total, sumOfSquares, count,quartile25

, median, quartile75, data);


(system) #2