Terms Aggregation not being run across all segments


We've recently hit a problem whereby not all documents are having a terms aggregation run across them, even though the documents match all aggregation filters. This does not happen every time (roughly ~10% of runs cause this problem). Force merging the affected index down to a single segment resolves the issue, however this shouldn't be required.

To give a brief overview;

  • Spark writes 3 documents to Elasticsearch using the saveToEs method.
  • The index is refreshed.
  • Terms aggregation is run.
  • Results are incorrect.

I've looked into the segments for the index when it's failing to return the correct results, and it has 2 segments, one of which appears to have the document that is not being aggregated against.

Elasticsearch 6.1.0.

I've attached gists for the settings etc. In this case it was the document with id 5a5ad8be80f4913e3a7f564fb3dc20b3ab855382 that is not being returned in the aggregation results. the source is however returned if size is set to a non-zero value.

Index Settings








This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.