Using logstash-filter-csv, unable to aggregate/visualize on integer fields


(rlonberg) #1

Using logstash-filter-csv, some resulting fields are converted to integers in kibana = I am unable to aggregate/visualize on them: 'Field data loading is forbidden on this [fieldname]'
...

Before I begin, I have already referred to the following topic:

So while I am familiar with the root/cause of this problem, I would like to discuss current options for resolving it.

I use a logstash CSV filter to produce various fields from a single-line log message.

These fields are fine to aggregate on / visualize on if they are strings, because a .raw value is created automatically. But kibana will not allow me to aggregate/visualze the fields that are integers, i.e. 'time-taken'

Is there any way for me to set these integer fields to a 'not_analyzed' index without having to go in and temper with the elasticsearch index template?


(Mark Walkom) #2

Nope, that happens at indexing time.

Also, it's best to keep subjects to a summary of the issue, something not so long :slight_smile:


(Magnus Bäck) #3

What do the mappings of your index look like?


(rlonberg) #4

Here is my index template,

{
  "logstash" : {
    "order" : 0,
    "template" : "logstash-*",
    "settings" : {
      "index" : {
        "refresh_interval" : "5s"
      }
    },
    "mappings" : {
      "_default_" : {
        "dynamic_templates" : [ {
          "message_field" : {
            "mapping" : {
              "index" : "analyzed",
              "omit_norms" : true,
              "fielddata" : {
                "format" : "disabled"
              },
              "type" : "string"
            },
            "match" : "message",
            "match_mapping_type" : "string"
          }
        }, {
          "string_fields" : {
            "mapping" : {
              "index" : "analyzed",
              "omit_norms" : true,
              "fielddata" : {
                "format" : "disabled"
              },
              "type" : "string",
              "fields" : {
                "raw" : {
                  "index" : "not_analyzed",
                  "ignore_above" : 256,
                  "type" : "string"
                }
              }
            },
            "match" : "*",
            "match_mapping_type" : "string"
          }
        } ],
        "properties" : {
          "@timestamp" : {
            "type" : "date"
          },
          "geoip" : {
            "dynamic" : true,
            "properties" : {
              "location" : {
                "type" : "geo_point"
              },
              "longitude" : {
                "type" : "float"
              },
              "latitude" : {
                "type" : "float"
              },
              "ip" : {
                "type" : "ip"
              }
            }
          },
          "@version" : {
            "index" : "not_analyzed",
            "type" : "string"
          }
        },
        "_all" : {
          "enabled" : true,
          "omit_norms" : true
        }
      }
    },
    "aliases" : { }
  }
}

So it would seem an update to my index template is my only solution for enabling visualization on these new integer fields that are produced by logstash-filter-csv ('time-taken', 'cs-bytes', 'sc-bytes', 'sc-port', etc)?

Appreciate the help guys.


(Magnus Bäck) #5

I asked for the mappings, not the index template. But yes, the problem is probably that time-taken is a string field. If you modify your Logstash configuration to convert that field into an integer then Elasticsearch will detect the field as an integer and things will work fine, but it never hurts to explicitly set the correct type in the index template too.


(rlonberg) #6

You are right that time-taken is a string field, I am going to modify my logstash configuration (logstash-filter-csv-convert I am assuming) to see if this resolves it, will let you know.

Here are my mappings btw,

"logstash-2016.10.11" : {
    "mappings" : {
      "log" : {
        "_all" : {
          "enabled" : true,
          "omit_norms" : true
        },
        "dynamic_templates" : [ {
          "message_field" : {
            "mapping" : {
              "index" : "analyzed",
              "omit_norms" : true,
              "fielddata" : {
                "format" : "disabled"
              },
              "type" : "string"
            },
            "match" : "message",
            "match_mapping_type" : "string"
          }
        }, {
          "string_fields" : {
            "mapping" : {
              "index" : "analyzed",
              "omit_norms" : true,
              "fielddata" : {
                "format" : "disabled"
              },
              "type" : "string",
              "fields" : {
                "raw" : {
                  "index" : "not_analyzed",
                  "ignore_above" : 256,
                  "type" : "string"
                }
              }
            },
            "match" : "*",
            "match_mapping_type" : "string"
          }
        } ],
        "properties" : {
          "@timestamp" : {
            "type" : "date",
            "format" : "strict_date_optional_time||epoch_millis"
          },
          "@version" : {
            "type" : "string",
            "index" : "not_analyzed"
          },
          "beat" : {
            "properties" : {
              "hostname" : {
                "type" : "string",
                "norms" : {
                  "enabled" : false
                },
                "fielddata" : {
                  "format" : "disabled"
                },
                "fields" : {
                  "raw" : {
                    "type" : "string",
                    "index" : "not_analyzed",
                    "ignore_above" : 256
                  }
                }
              },
              "name" : {
                "type" : "string",
                "norms" : {
                  "enabled" : false
                },
                "fielddata" : {
                  "format" : "disabled"
                },
                "fields" : {
                  "raw" : {
                    "type" : "string",
                    "index" : "not_analyzed",
                    "ignore_above" : 256
                  }
                }
              }
            }
          },
          
          ...............
          "time-taken" : {
            "type" : "string",
            "norms" : {
              "enabled" : false
            },
            "fielddata" : {
              "format" : "disabled"
            },
            "fields" : {
              "raw" : {
                "type" : "string",
                "index" : "not_analyzed",
                "ignore_above" : 256
              }
            }
          },
         ................
        }
      }
    }
  },

(rlonberg) #7

OK,
Modifying the logstash configuration to have the csv filter plugin convert these fields into integers worked for me (showed up as an int in the new index).

change in logstash conf was adding 'convert' field to csv filter:

csv {
columns => ["date","time","s-sitename","s-computername","s-ip","cs-method","cs-uri-stem","cs-uri-query","s-port","cs-username","c-ip","cs-version","cs(User-Agent)","cs(Cookie)","cs(Referrer)","cs-host","sc-status","sc-substatus","sc-win32-status","sc-bytes","cs-bytes","time-taken"]
convert => { "time-taken" => "integer" }
separator => " "
}

This change in my logstash conf resulted in the following mapping change for 'time-taken'

previous:

"time-taken" : {
"type" : "string",
"norms" : {
"enabled" : false
},
"fielddata" : {
"format" : "disabled"
},
"fields" : {
"raw" : {
"type" : "string",
"index" : "not_analyzed",
"ignore_above" : 256
}
}
},

new:

"time-taken" : {
"type" : "long"
},

It was a relief to not have to make any changes to the index template or the mappings, but instead handle this on the logstash configuration side. I can finally aggregate/visualize on these new fields.

Thanks for the help!


(system) #8