Weird field names with Transforms

I used a text field to group my data - STATE, although the input to the grouping was STATE.keyword.

Then I defined a second level transfrom that took my hourly data and rolled it up to daily data, also grouping it by state. That refused to run because the STATE.keyword field didn't have fielddata.

Manually created a mapping for the hourly index, specifying fielddata: true for STATE.keyword. Having changed the type of @timestamp from long to date, I can now define an index_pattern to look at my data.

I've got field names of STATE.keyword and STATE.keyword.keyword. :confused:

Is it going to break anything if I change the transform and the mapping so the STATE.keyword field in the hourly index is just called STATE?

I do not fully understand the issue. Can you post your configuration and some example data? The mapping of the source index would be helpful, too.

If you have keyword sub field in your mapping it is not necessary to enable fielddata, but you have to explicitly use the keyword subfield: STATE.keyword in the configuration instead of STATE.

Regarding the Transforms UI: You can change output field names as long as you do not have duplicates or conflicts (fields and subfields at the same time).

There is a little icon to change e.g. the group_by fieldname:

image

In the raw data index the field has:

         "STATE" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "ignore_above" : 256,
                "type" : "keyword"
              }
            }
          },

In my case the field has 7 possible values - SA, WA, ACT, VIC, NSW, TAS or NT. I can see their fields STATE and STATE.keyword in the original record.

I used Kibana to build a continuous transform that took the raw data and grouped it by hour. The STATE field was used in the group_by section of the transform:

  "pivot": {
    "group_by": {
      "@timestamp": {
        "date_histogram": {
          "field": "@timestamp",
          "calendar_interval": "1h"
        }
      },
      "STATE.keyword": {
        "terms": {
          "field": "STATE.keyword"
        }
      },

...although I was only looking at the Kibana panels before I exported the transform to save it, and I'd selected STATE.keyword as the groupBy field - which seemed reasonable seeing as nothing else in elastice wants to work directly with text fields.

Took a bit of fiddling about to get an index pattern working for the index that the continuous transform was writing to. I'd also copied and edited the transform to make a second level transform that read the index from the first (hourly) one and wrote a higher level, daily, index. That had produced an error about being unable to read STATE.keyword because it didn't have fielddata: true set, so I'd editing the mapping I'd created for the hourly index to add fielddata: true to the odd looking definition of the STATE field:

    "STATE" : {
      "properties" : {
        "keyword" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "fielddata": true,
              "ignore_above" : 256
            }
          }
        }
      }
    },

I then loaded this up, finally got my index_pattern for the hourly index properly created (I'd fixed the issue with the @timestamp type that 's in another post and looked at the field in the hourly index using Discover. I saw:

STATE.keyword
STATE.keyword.keyword

So, up until now, the .keyword variant for a field had always been a secondary, derived field, so I'm wondering where the STATE field is and why I've got a double .keyword field? All I wanted/expected in the hourly index was a STATE field (which would have an associated .keyword field).

What Kibana seems to have done, when it composed the transform, was to create a field called state with a property called keyword that held the value I wanted. Adding the fieldData:true then caused it produce an additional .keyword field for the .keyword field. That was unexpected and confusing.

So, now I think I've got a better idea about the screw ball if threw me, what's my complaint?

Using .keyword as a primary field name is confusing, especially for folks like me who are used to Db2 which, when I tell it to group data by STATE simply gives me a column called STATE. That's sort of what I was expecting elastic to do.

Secondly, if I'm taking the trouble to group my data by a field it's probably because I want to use it in dashboards or other transforms, so having fielddata: true for it (especially if it was set for the input fields) would probably save a lot of people a lot of work as it's pretty unusable without it (and, once it's created, you have to extract the mapping, edit it, trash the index (or all of elastic) and then load the mapping before you start the transform).

Finally a naming scheme that adds an extra .keyword with each level of roll up transform is something of a scalability/usability concern. By the time I get up to weekly I'm looking at STATE.keyword.keyword.keyword as my value and STATE.keyword.keyword.keyword.keyword as the field I have to use in my visualizations. (I'm using continuous transforms because I wanted more control that rollups offer.)

My suggestion for this is to change:

to

  "pivot": {
    "group_by": {
      "@timestamp": {
        "date_histogram": {
          "field": "@timestamp",
          "calendar_interval": "1h"
        }
      },
      "STATE": {
        "terms": {
          "field": "STATE.keyword"
        }
      },

This can be done in the UI as well as using e.g. dev console. Note that the UI only makes a suggestion for a fieldname, it can be changed without a problem.

I can not reproduce how you ended up with a chain like STATE.keyword.keyword. The mapping that transform auto-creates:

      "properties" : {
        "STATE.keyword" : {
          "type" : "keyword"
        },
        "STATE" : {
          "type" : "object"
        },

I assume you created the mapping manually, is this the case?

Again, the .keyword suffix can be avoided by renaming the suggested field name.

I think however that the name suggestion in the UI can be improved, it should probably not suggest a name with the .keyword suffix. I will bring this up within the development group.

Regarding fielddata: This is only required if you do not have keyword, details can be found in the documentation.