Aggregation Query Fails Until Restart


(Xandy Johnson) #1

I am new to Elasticsearch and trying to update a project built on ES 1.3.1 to use 1.7.2. The project performs multiple steps, indexing in one step ("Step 1") and then querying using aggregations in the next step ("Step 2"). There is some code that attempts to flush in between. Using 1.3.1, this process works, but using 1.7.2, it fails. If I try to run the query from Step 2 outside my Java code (via curl), I still get nothing. However, if I shutdown and restart ES and then run it again, I do get results. Even before a restart if I run a "simple" Date Histogram aggregation on the new data I do get results. That query looks like this:

{
  "size" : 0,
  "aggs" : {
    "date_aggregate" : {
        "date_histogram" : {
            "field" : "event_timestamp",
            "interval" : "day"
        }
    }
  }
}

The query that does not return results until an ES restart looks like this:

{
  "size" : 0,
  "aggregations" : {
    "data_source" : {
      "filter" : {
        "bool" : {
          "must" : [ {
            "range" : {
              "event_timestamp" : {
                "from" : "2015-05-15 00:18:54",
                "to" : "2015-09-11 23:51:36",
                "include_lower" : true,
                "include_upper" : false
              }
            }
          }, {
            "term" : {
              "data_source" : "Foo"
            }
          } ]
        }
      },
      "aggregations" : {
        "aoi_id_aggregate" : {
          "terms" : {
            "field" : "aoi_id",
            "size" : 0
          },
          "aggregations" : {
            "date_aggregate" : {
              "date_histogram" : {
                "field" : "event_timestamp",
                "interval" : "day",
                "order" : {
                  "_key" : "desc"
                }
              }
            }
          }
        }
      }
    }
  }
}

I have tried to refresh, sync, and flush the index in between, but to no avail. Please help me find the appropriate way to do this.


(Mark Walkom) #2

It fails how, are there errors?


(Xandy Johnson) #3

There are no errors, just no results, by which I mean that the buckets array is empty, like this:

{
    "_shards": {
        "failed": 0,
        "successful": 5,
        "total": 5
    },
    "aggregations": {
        "data_source": {
            "aoi_id_aggregate": {
                "buckets": [],
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0
            },
            "doc_count": 19331
        }
    },
    "hits": {
        "hits": [],
        "max_score": 0.0,
        "total": 77055
    },
    "timed_out": false,
    "took": 33
}

After I restart, if I run the same query I get buckets broken down by AOI by date.


(Xandy Johnson) #4

Further information...

If I close and then open the index (using the RESTful methods shown on https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-open-close.html), I get results. That's less drastic than restarting ES, but still, I think there must be a better way, perhaps even something that would be obvious to someone with more ES experience.

Also, when switching from 1.3.1 to 1.7.2, I had to remove a call to .setWaitForMerge(true) from an optimize call between Step 1 and Step 2 that that looked like this:

LOG.debug("Flushing and expunging '{}' index", indexName);
getClient().admin().indices().prepareOptimize(indexName)
        .setFlush(true)
        .setOnlyExpungeDeletes(true)
        .setWaitForMerge(true)
        .execute().actionGet();

(system) #5