Confused about how to use .raw fields and not analyze string fields


(Mackenza) #1

I apologize if this is too newbie a question but I am struggling with setting up a terms visualization in Kibana 4 because the string field with the terms is analysed.

I am using logstash with the elasticsearch output to populate my index. In my searching around the web, my understanding is that the default logstash template for ES creates a multi-field for each string field where one of the fields is .raw. So somewhere in my index is the ability to use a not_analyzed version of my field. I think I get that.

What I am 100% struggling with is how to make that .raw field show up in my field list in Kibana? I have searched around for it and there are vague references to "make the .raw known to Kibana and refresh field list" but I can't for the life of me figure out how to do that. I have also seen references to creating a new template for Logstash to replace the default but there hasn't been enough detail for me to figure that out.

I am hoping someone with patience can help me out here? Thanks in advance.


Kibana data tables visualizations seperate fields on punctuation. How can I combine them?
(Tanya Bragin) #2

What Kibana version are you using?

Assuming it's Kibana 4, you won't see the raw values in "Settings >> Indices" and "Discover", but you should see them in metric selection of "Visualize" (see screenshot). The reason for that is that only Visualize deals with aggregations, where raw values matter.

If you don't see them there, I'd suggest the following:

  • Make sure you're using the default indexing template that comes with Logstash (you shouldn't have to modify it).
  • If you do have to re-index your data, refresh your mappings in Kibana (or delete and re-create the index pattern).


(Mackenza) #3

Thank you for the reply. I don't see those raw fields as you show. I am
wondering if it's because my indices are not of the pattern logstash-*? I
will rename my indices and see.

Again tyvm


(Tanya Bragin) #4

Index name shouldn't matter, to be honest.


(Mackenza) #5

It did... turns out the default elasticsearch template in Logstash has a
filter on it of logstash-*. When I changed the index names, all was good
and I can see the .raw fields now.

Thanks for your help!


(Tanya Bragin) #6

Cool! That makes sense. Thanks for the update :smile:


(David Reagan) #7

I have a similar issue. Did you change the default index patter in the Kibana 4 settings to logstash-*? Or did you change something in the template the logstash uses when it sends data to elasticsearch?


(Mackenza) #8

@jerrac I had to name my indices to be name logstash-*. This means I had to change my Logstash config to output to indices with that name pattern... so where I had events in my Elasticsearch output index, I had to change it to logstash-events. Kibana can only register indices that exist so just changing Kibana isn't enough.

The reason for this is because the default template that ships with Logstash only adds the raw fields to indices that match the pattern logstash-*. You could change your default template to match all indices... but that is much harder than just renaming them to match :wink:


(David Reagan) #9

Hmm... All my indices are already named logstash-YYY-MM-DD.

I do have some .raw fields. So maybe this isn't my problem. Thanks for the reply. :smile:


(Mackenza) #10

I see. Specifically my problem was I did not see any fieldname.raw fields in Kibana. This was a direct result of not naming my indices with the logstash-* pattern. As soon as I renamed them and rebuilt them in ElasticSearch, I was fine.


(David Reagan) #11

Heh. Well, it was user error on my part :blush: . I needed to refresh my window...


(John) #12

In my case, it does not look like logstash ships the template over to elastic search when using the HTTP ES plugin. I don't even have a logstash template on my ES index. Not sure why or what the _template is supposed to be?


#13

Is that the only option, naming them logstash-*?


#14

Bump.


(Mackenza) #15

The alternative is to change the default template to not implement that filter. I have no idea how that is done, though. I think I saw it one time on StackOverflow but I just went with the flow and renamed my indices.


#16

Thanks for the info!

Anybody else know how to change that?


(Vincent Tran) #17

If you don't want to change your indices name, you can add a new dynamic template with a pattern matching your indices.

For example, if my indices are psm-*, I would PUT /_template

     "psm1" : {
         "order" : 0,
         "template" : "psm-*",
         "settings" : {
             "index" : {
                 "refresh_interval" : "5s"
             }
         },
         "mappings" : {
             "_default_" : {
                 "dynamic_templates" : [{
                         "message_field" : {
                             "mapping" : {
                                 "index" : "analyzed",
                                 "omit_norms" : true,
                                 "type" : "string",
                                 "fields" : {
                                     "raw" : {
                                         "ignore_above" : 256,
                                         "index" : "not_analyzed",
                                         "type" : "string"
                                     }
                                 }
                             },
                             "match_mapping_type" : "string",
                             "match" : "message"
                         }
                     }, {
                         "string_fields" : {
                             "mapping" : {
                                 "index" : "analyzed",
                                 "omit_norms" : true,
                                 "type" : "string",
                                 "fields" : {
                                     "raw" : {
                                         "ignore_above" : 256,
                                         "index" : "not_analyzed",
                                         "type" : "string"
                                     }
                                 }
                             },
                             "match_mapping_type" : "string",
                             "match" : "*"
                         }
                     }
                 ],
                 "_all" : {
                     "omit_norms" : true,
                     "enabled" : true
                 },
                 "properties" : {
                     "geoip" : {
                         "dynamic" : true,
                         "type" : "object",
                         "properties" : {
                             "location" : {
                                 "type" : "geo_point"
                             }
                         }
                     },
                     "@version" : {
                         "index" : "not_analyzed",
                         "type" : "string"
                     }
                 }
             }
         },
         "aliases" : {}

     }
 }

Now this is a pretty boilerplate template and you can customize it further to fit your needs. But the important stanza is:

 {
     "string_fields" : {
             "mapping" : {
                    "index" : "analyzed",
                   "omit_norms" : true,
                   "type" : "string",
                   "fields" : {
                        "raw" : {
                              "ignore_above" : 256,
                              "index" : "not_analyzed",
                              "type" : "string"
                          }
                    }
               },
               "match_mapping_type" : "string",
               "match" : "*"
      }
   }

Which gives you an extra "raw" field that is "not analyzed" for all string fields in this index that matches the wildcard "*". That wildcard match will be useful if you only want certain string fields to have a "raw" sub field.


#18

Hi i have created an index like this

<connectionString value="Server=10.30.1.63;Index=logstash;Port=9200;rolling=true"/>

and then i use the following fields:

<param name="ConversionPattern" value="%date - %level - %message %property{mstimeload} %property{applicationid} %property{applicationid} %property{page} 
           %property{ipclient} %property{browser} %property{browsersignature} %property{appversion} %property{sessionuniquecodetag} %property{globalcountertailsloaded} 
           %property{ipserveraddress} %newline" />

And in kibana I don't see any .raw fields. I would like some of the field to be "not_analyzed". My index is like this:

{
   "logstash-2015.12.01": {
   "aliases": {},
   "mappings": {
     "logEvent": {
        "properties": {
           "className": {
              "type": "string"
            },
           "domain": {
              "type": "string"
           },
           "exception": {
               "type": "object"
           },
           "fileName": {
              "type": "string"
           },
           "fix": {
              "type": "string"
           },
          "fullInfo": {
             "type": "string"
           },
           "hostName": {
              "type": "string"
           },
           "identity": {
              "type": "string"
           },
           "level": {
              "type": "string"
           },
           "lineNumber": {
              "type": "string"
           },
           "loggerName": {
              "type": "string"
           },
           "message": {
              "type": "string"
           },
          "messageObject": {
             "type": "object"
          },
          "methodName": {
             "type": "string"
          },
         "properties": {
             "properties": {
                "@timestamp": {
                    "type": "date",
                    "format": "strict_date_optional_time||epoch_millis"
                 },
                "applicationid": {
                   "type": "string"
                },
                "appversion": {
                   "type": "string"
                },
                "browser": {
                   "type": "string"
                },
               "browsersignature": {
                   "type": "string"
                },
               "ipclient": {
                   "type": "string"
                },
               "ipserveraddress": {
                  "type": "string"
                },
               "log4net:HostName": {
                 "type": "string"
               },
               "log4net:Identity": {
                 "type": "string"
               },
              "log4net:UserName": {
                 "type": "string"
               },
              "mstimeload": {
                  "type": "string"
               },
               "page": {
                  "type": "string"
                },
               "sessionuniquecodetag": {
                  "type": "string"
               }
            }
         },
        "threadName": {
             "type": "string"
        },
        "timeStamp": {
            "type": "date",
            "format": "strict_date_optional_time||epoch_millis"
         },
         "userName": {
          "type": "string"
       }
      }
    }
  },
 "settings": {
      "index": {
       "creation_date": "1448964809178",
       "number_of_shards": "5",
       "number_of_replicas": "1",
       "uuid": "fFRoaMN4QDSFNCta4lV_QQ",
       "version": {
          "created": "2000099"
       }
    }
 },
 "warmers": {}
  }
 }

How can I do it?


(Vincent Tran) #19

As this is a logstash-* index, I'm surprised that it is not using the default logstash dynamic template (which gives you a .raw field for each string field). You can modify the template I posted above by changing{"template" : "psm-*"} to "template" : "logstash-*". Then any new indices created with the name logstash-* will start using that template and all your string fields will have a .raw field. I am however not sure about the nested properties field in your index though. The wild card match might handle the nested string fields, it might not.

I'm also curious to see why the default logstash template is not being used here for your index. Can you run this and provide us with the output (formatted or Github Gist would be nice)?

GET /_template


#20

It looks like my default template is packetbeat-.. how is this possible??
{
"packetbeat": {
"order": 0,
"template": "packetbeat-
",
"settings": {
"index": {
"refresh_interval": "5s"
}
},
"mappings": {
"default": {
"dynamic_templates": [
{
"template1": {
"mapping": {
"ignore_above": 1024,
"index": "not_analyzed",
"type": "{dynamic_type}",
"doc_values": true
},
"match": "*"
}
}
],
"_all": {
"norms": {
"enabled": false
},
"enabled": true
},
"properties": {
"request": {
"norms": {
"enabled": false
},
"index": "analyzed",
"type": "string"
},
"client_location": {
"type": "geo_point"
},
"response": {
"norms": {
"enabled": false
},
"index": "analyzed",
"type": "string"
},
"query": {
"index": "not_analyzed",
"type": "string",
"doc_values": true
},
"params": {
"norms": {
"enabled": false
},
"index": "analyzed",
"type": "string"
},
"timestamp": {
"type": "date"
}
}
}
},
"aliases": {}
}
}