In the raw data index the field has:
"STATE" : {
"type" : "text",
"fields" : {
"keyword" : {
"ignore_above" : 256,
"type" : "keyword"
}
}
},
In my case the field has 7 possible values - SA, WA, ACT, VIC, NSW, TAS or NT. I can see their fields STATE and STATE.keyword in the original record.
I used Kibana to build a continuous transform that took the raw data and grouped it by hour. The STATE field was used in the group_by section of the transform:
"pivot": {
"group_by": {
"@timestamp": {
"date_histogram": {
"field": "@timestamp",
"calendar_interval": "1h"
}
},
"STATE.keyword": {
"terms": {
"field": "STATE.keyword"
}
},
...although I was only looking at the Kibana panels before I exported the transform to save it, and I'd selected STATE.keyword as the groupBy field - which seemed reasonable seeing as nothing else in elastice wants to work directly with text fields.
Took a bit of fiddling about to get an index pattern working for the index that the continuous transform was writing to. I'd also copied and edited the transform to make a second level transform that read the index from the first (hourly) one and wrote a higher level, daily, index. That had produced an error about being unable to read STATE.keyword because it didn't have fielddata: true set, so I'd editing the mapping I'd created for the hourly index to add fielddata: true to the odd looking definition of the STATE field:
"STATE" : {
"properties" : {
"keyword" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"fielddata": true,
"ignore_above" : 256
}
}
}
}
},
I then loaded this up, finally got my index_pattern for the hourly index properly created (I'd fixed the issue with the @timestamp type that 's in another post and looked at the field in the hourly index using Discover. I saw:
STATE.keyword
STATE.keyword.keyword
So, up until now, the .keyword variant for a field had always been a secondary, derived field, so I'm wondering where the STATE field is and why I've got a double .keyword field? All I wanted/expected in the hourly index was a STATE field (which would have an associated .keyword field).
What Kibana seems to have done, when it composed the transform, was to create a field called state with a property called keyword that held the value I wanted. Adding the fieldData:true then caused it produce an additional .keyword field for the .keyword field. That was unexpected and confusing.
So, now I think I've got a better idea about the screw ball if threw me, what's my complaint?
Using .keyword as a primary field name is confusing, especially for folks like me who are used to Db2 which, when I tell it to group data by STATE simply gives me a column called STATE. That's sort of what I was expecting elastic to do.
Secondly, if I'm taking the trouble to group my data by a field it's probably because I want to use it in dashboards or other transforms, so having fielddata: true for it (especially if it was set for the input fields) would probably save a lot of people a lot of work as it's pretty unusable without it (and, once it's created, you have to extract the mapping, edit it, trash the index (or all of elastic) and then load the mapping before you start the transform).
Finally a naming scheme that adds an extra .keyword with each level of roll up transform is something of a scalability/usability concern. By the time I get up to weekly I'm looking at STATE.keyword.keyword.keyword as my value and STATE.keyword.keyword.keyword.keyword as the field I have to use in my visualizations. (I'm using continuous transforms because I wanted more control that rollups offer.)