Discuss the Elastic Stack

Terms Aggregation not being run across all segments

Elastic Stack Elasticsearch

bevans88 (Brent Evans) April 25, 2018, 1:23pm 1

Hi,

We've recently hit a problem whereby not all documents are having a terms aggregation run across them, even though the documents match all aggregation filters. This does not happen every time (roughly ~10% of runs cause this problem). Force merging the affected index down to a single segment resolves the issue, however this shouldn't be required.

To give a brief overview;

Spark writes 3 documents to Elasticsearch using the saveToEs method.
The index is refreshed.
Terms aggregation is run.
Results are incorrect.

I've looked into the segments for the index when it's failing to return the correct results, and it has 2 segments, one of which appears to have the document that is not being aggregated against.

Elasticsearch 6.1.0.

I've attached gists for the settings etc. In this case it was the document with id 5a5ad8be80f4913e3a7f564fb3dc20b3ab855382 that is not being returned in the aggregation results. the source is however returned if size is set to a non-zero value.

Index Settings

gist.github.com

https://gist.github.com/bevans88/a8236f28abb459dbf5e8ed179dec82a5

index.json

{
	"index1": {
		"settings": {
			"index": {
				"refresh_interval": "-1",
				"number_of_shards": "1",
				"provided_name": "index1",
				"max_result_window": "2147483647",
				"analysis": {
					"analyzer": {

This file has been truncated. show original

Mapping

gist.github.com

https://gist.github.com/bevans88/0bbd843140971c8ed1ec69904ee7acc5

mapping.json

{
	"mappings": {
		"_doc": {
			"dynamic": "true",
			"_all": {
				"enabled": false
			},
			"dynamic_templates": [{
					"resource_id_match_template": {
						"path_match": "resource.*.id",

This file has been truncated. show original

Documents

gist.github.com

https://gist.github.com/bevans88/7c183993ddac8647999f02811c178b19

documents.json

{
	"took": 1,
	"timed_out": false,
	"_shards": {
		"total": 1,
		"successful": 1,
		"skipped": 0,
		"failed": 0
	},
	"hits": {

This file has been truncated. show original

Aggregation

gist.github.com

https://gist.github.com/bevans88/198398b2331a8414a4c132a62dc9a7cf

aggregation.json

{
	"size": 0,
	"aggregations": {
		"firstNested": {
			"nested": {
				"path": "firstNested"
			},
			"aggregations": {
				"firstNestedFilter": {
					"filter": {

This file has been truncated. show original

Results

gist.github.com

https://gist.github.com/bevans88/0321c7d74d6e0ae650bb7d584313c101

results.json

{
	"took": 2,
	"timed_out": false,
	"_shards": {
		"total": 1,
		"successful": 1,
		"skipped": 0,
		"failed": 0
	},
	"hits": {

This file has been truncated. show original

Segments

gist.github.com

https://gist.github.com/bevans88/3415bf0dd9fce59c4d77d5d8b7475f3c

segments.json

{
	"_shards": {
		"total": 2,
		"successful": 1,
		"failed": 0
	},
	"indices": {
		"index1": {
			"shards": {
				"0": [{

This file has been truncated. show original

Thanks,

Brent

system (system) Closed May 23, 2018, 1:23pm 2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views	Activity
Aggregation returning contradictory results Elasticsearch	3	379	November 3, 2020
Term aggregation shard level aggreagation issue Elasticsearch	2	508	January 26, 2018
Search_phase_execution_exception in partitioned terms aggregation Elasticsearch	1	292	October 27, 2022
Aggregation not hitting all shards (ElasticSearch 1.7.4) Elasticsearch	2	476	December 5, 2017
Aggregation based on terms field is not working Elasticsearch	3	555	August 6, 2020

© 2020. All Rights Reserved - Elasticsearch

Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries
Trademarks
Terms
Privacy
Brand
Code of Conduct

Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.