Which aggregation should be used for transform operations on multifield?

Scylanth · February 28, 2022, 2:09pm

Hello,

I am using 7.16.3 Elasticsearch and I am trying to perform a transform aggregation on an index to create a summary index (basically have the same index but with an aggregated timestamp by day to be smaller).

Among the fields I am aggregating, there is a multifield that has this mapping :

"features" : {
	"fielddata" : true,
		"analyzer" : "feature_analyzer",
		"type" : "text",
		"fields" : {
		"keyword" : {
			"type" : "keyword"
		}
	}
}

and this analyzer :

"analysis" : {
	"analyzer" : {
		"feature_analyzer" : {
			"filter" : [
				"uppercase"
			],
			"tokenizer" : "tilde_tokenizer"
		}
	},
	"tokenizer" : {
		"tilde_tokenizer" : {
			"pattern" : """\~""",
			"type" : "simple_pattern_split"
		}
	}
}

For example, if the source is value1~value2, a terms aggregation on features will give two results (value 1 and value 2), and a terms aggregation on features.keyword will output only one (value1~value2).

I need in my transformed index to have the same field with the same content (so the same features value and the same features.keyword value)

The problem is that when I try a terms aggregation using features.keyword, I lose some information as the aggregation is not a text anymore, and I have as a result the non-analyze field.
A terms aggregation on features does not work either : I have null fields.

What can I do to get the expected results?

casterQ · March 1, 2022, 3:23am

You'd better not open text type filed with fielddata for agg，because text's fielddata performance is not good after the amount of data comes up.
You can use split pipeline as below，hope it can meet your needs.

PUT _ingest/pipeline/pp1
{
  "description": "splitby~",
  "processors": [
    {
      "split": {
        "field": "k1",
        "separator": "~"
      }
    }
  ]
}
PUT cc1
{
  "mappings": {
    "properties": {
      "k1": {
        "type": "keyword"
      }
    }
  }
}
POST cc1/_doc?pipeline=pp1
{
  "k1":"ab"
}
POST cc1/_doc?pipeline=pp1
{
  "k1":"a~b"
}
GET cc1/_search
{
  "aggs": {
    "xxx": {
      "terms": {
        "field": "k1",
        "size": 10
      }
    }
  }
}

Scylanth · March 4, 2022, 4:33pm

I can't rely on ingest here for two reasons. The first one is that I need the same exact mapping than the original index, and because I do not think that transform operations use pipelines.

However I was able to solve my issue by creating the index with an index template before launching the transform operation.

system · April 1, 2022, 4:33pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Alternate to multi-terms aggregation in transform Elasticsearch transforms	6	984	August 11, 2022
Issues with custom analyzer when running aggregate queries Elasticsearch	3	1013	September 18, 2017
Terms aggregation on analyzed field Elasticsearch	1	388	July 5, 2017
Aggregate on a field Elasticsearch	2	404	July 5, 2017
Multi-word, multi-field search with analyzers Elasticsearch	1	396	July 6, 2017

Which aggregation should be used for transform operations on multifield?

Related topics