Weird field names with Transforms

mykael · June 25, 2020, 4:19am

I used a text field to group my data - STATE, although the input to the grouping was STATE.keyword.

Then I defined a second level transfrom that took my hourly data and rolled it up to daily data, also grouping it by state. That refused to run because the STATE.keyword field didn't have fielddata.

Manually created a mapping for the hourly index, specifying fielddata: true for STATE.keyword. Having changed the type of @timestamp from long to date, I can now define an index_pattern to look at my data.

I've got field names of STATE.keyword and STATE.keyword.keyword.

Is it going to break anything if I change the transform and the mapping so the STATE.keyword field in the hourly index is just called STATE?

Hendrik_Muhs · June 26, 2020, 6:24am

I do not fully understand the issue. Can you post your configuration and some example data? The mapping of the source index would be helpful, too.

If you have keyword sub field in your mapping it is not necessary to enable fielddata, but you have to explicitly use the keyword subfield: STATE.keyword in the configuration instead of STATE.

Regarding the Transforms UI: You can change output field names as long as you do not have duplicates or conflicts (fields and subfields at the same time).

There is a little icon to change e.g. the group_by fieldname:

mykael · June 27, 2020, 1:48pm

In the raw data index the field has:

         "STATE" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "ignore_above" : 256,
                "type" : "keyword"
              }
            }
          },

In my case the field has 7 possible values - SA, WA, ACT, VIC, NSW, TAS or NT. I can see their fields STATE and STATE.keyword in the original record.

I used Kibana to build a continuous transform that took the raw data and grouped it by hour. The STATE field was used in the group_by section of the transform:

  "pivot": {
    "group_by": {
      "@timestamp": {
        "date_histogram": {
          "field": "@timestamp",
          "calendar_interval": "1h"
        }
      },
      "STATE.keyword": {
        "terms": {
          "field": "STATE.keyword"
        }
      },

...although I was only looking at the Kibana panels before I exported the transform to save it, and I'd selected STATE.keyword as the groupBy field - which seemed reasonable seeing as nothing else in elastice wants to work directly with text fields.

Took a bit of fiddling about to get an index pattern working for the index that the continuous transform was writing to. I'd also copied and edited the transform to make a second level transform that read the index from the first (hourly) one and wrote a higher level, daily, index. That had produced an error about being unable to read STATE.keyword because it didn't have fielddata: true set, so I'd editing the mapping I'd created for the hourly index to add fielddata: true to the odd looking definition of the STATE field:

    "STATE" : {
      "properties" : {
        "keyword" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "fielddata": true,
              "ignore_above" : 256
            }
          }
        }
      }
    },

I then loaded this up, finally got my index_pattern for the hourly index properly created (I'd fixed the issue with the @timestamp type that 's in another post and looked at the field in the hourly index using Discover. I saw:

STATE.keyword
STATE.keyword.keyword

So, up until now, the .keyword variant for a field had always been a secondary, derived field, so I'm wondering where the STATE field is and why I've got a double .keyword field? All I wanted/expected in the hourly index was a STATE field (which would have an associated .keyword field).

What Kibana seems to have done, when it composed the transform, was to create a field called state with a property called keyword that held the value I wanted. Adding the fieldData:true then caused it produce an additional .keyword field for the .keyword field. That was unexpected and confusing.

So, now I think I've got a better idea about the screw ball if threw me, what's my complaint?

Using .keyword as a primary field name is confusing, especially for folks like me who are used to Db2 which, when I tell it to group data by STATE simply gives me a column called STATE. That's sort of what I was expecting elastic to do.

Secondly, if I'm taking the trouble to group my data by a field it's probably because I want to use it in dashboards or other transforms, so having fielddata: true for it (especially if it was set for the input fields) would probably save a lot of people a lot of work as it's pretty unusable without it (and, once it's created, you have to extract the mapping, edit it, trash the index (or all of elastic) and then load the mapping before you start the transform).

Finally a naming scheme that adds an extra .keyword with each level of roll up transform is something of a scalability/usability concern. By the time I get up to weekly I'm looking at STATE.keyword.keyword.keyword as my value and STATE.keyword.keyword.keyword.keyword as the field I have to use in my visualizations. (I'm using continuous transforms because I wanted more control that rollups offer.)

Hendrik_Muhs · June 29, 2020, 7:34pm

My suggestion for this is to change:

mykael:

  "pivot": {
    "group_by": {
      "@timestamp": {
        "date_histogram": {
          "field": "@timestamp",
          "calendar_interval": "1h"
        }
      },
      "STATE.keyword": {
        "terms": {
          "field": "STATE.keyword"
        }
      },

to

  "pivot": {
    "group_by": {
      "@timestamp": {
        "date_histogram": {
          "field": "@timestamp",
          "calendar_interval": "1h"
        }
      },
      "STATE": {
        "terms": {
          "field": "STATE.keyword"
        }
      },

This can be done in the UI as well as using e.g. dev console. Note that the UI only makes a suggestion for a fieldname, it can be changed without a problem.

I can not reproduce how you ended up with a chain like STATE.keyword.keyword. The mapping that transform auto-creates:

      "properties" : {
        "STATE.keyword" : {
          "type" : "keyword"
        },
        "STATE" : {
          "type" : "object"
        },

I assume you created the mapping manually, is this the case?

Again, the .keyword suffix can be avoided by renaming the suggested field name.

I think however that the name suggestion in the UI can be improved, it should probably not suggest a name with the .keyword suffix. I will bring this up within the development group.

Regarding fielddata: This is only required if you do not have keyword, details can be found in the documentation.

system · July 27, 2020, 7:34pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Transform creating field as keyword instead of text Kibana	2	514	May 7, 2020
Field_name.keyword appended to fields after upgrading to 6.5.x Elasticsearch	6	635	January 15, 2019
Using keyword type for mapping Elasticsearch	7	2197	July 31, 2018
Continuous Transform Timestamp isn't a timestamp Elasticsearch	4	1335	July 27, 2020
Changed mapping from text+keyword to keyword, queries not working now Elasticsearch	1	414	December 18, 2019

Weird field names with Transforms

Related topics