Which aggregation should be used for transform operations on multifield?

Hello,

I am using 7.16.3 Elasticsearch and I am trying to perform a transform aggregation on an index to create a summary index (basically have the same index but with an aggregated timestamp by day to be smaller).

Among the fields I am aggregating, there is a multifield that has this mapping :

"features" : {
	"fielddata" : true,
		"analyzer" : "feature_analyzer",
		"type" : "text",
		"fields" : {
		"keyword" : {
			"type" : "keyword"
		}
	}
}

and this analyzer :

"analysis" : {
	"analyzer" : {
		"feature_analyzer" : {
			"filter" : [
				"uppercase"
			],
			"tokenizer" : "tilde_tokenizer"
		}
	},
	"tokenizer" : {
		"tilde_tokenizer" : {
			"pattern" : """\~""",
			"type" : "simple_pattern_split"
		}
	}
}

For example, if the source is value1~value2, a terms aggregation on features will give two results (value 1 and value 2), and a terms aggregation on features.keyword will output only one (value1~value2).

I need in my transformed index to have the same field with the same content (so the same features value and the same features.keyword value)

The problem is that when I try a terms aggregation using features.keyword, I lose some information as the aggregation is not a text anymore, and I have as a result the non-analyze field.
A terms aggregation on features does not work either : I have null fields.

What can I do to get the expected results?

You'd better not open text type filed with fielddata for agg,because text's fielddata performance is not good after the amount of data comes up.
You can use split pipeline as below,hope it can meet your needs.

PUT _ingest/pipeline/pp1
{
  "description": "splitby~",
  "processors": [
    {
      "split": {
        "field": "k1",
        "separator": "~"
      }
    }
  ]
}
PUT cc1
{
  "mappings": {
    "properties": {
      "k1": {
        "type": "keyword"
      }
    }
  }
}
POST cc1/_doc?pipeline=pp1
{
  "k1":"ab"
}
POST cc1/_doc?pipeline=pp1
{
  "k1":"a~b"
}
GET cc1/_search
{
  "aggs": {
    "xxx": {
      "terms": {
        "field": "k1",
        "size": 10
      }
    }
  }
}

I can't rely on ingest here for two reasons. The first one is that I need the same exact mapping than the original index, and because I do not think that transform operations use pipelines.

However I was able to solve my issue by creating the index with an index template before launching the transform operation.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.