LogStash Imports Numbers as Strings into Elastic

I am struggling with importing fields into Elastic with the correct format. I am getting numbers stored as string and strings stored as IP addresses and wonder if it is my template. I eventually gave up in getting IP address stored as strings and started dropping the field if it does not match the IPv4 format since Elastic was rejecting the events.

I delete the indicies from Elastic and Kafka settings page.

$ curl -XDELETE 'http://localhost:9200/access_log-*'
{"acknowledged":true}

I import the template. I have switched clientIP back to ip but tried to import it as string.

$ curl -XPUT 'http://localhost:9200/_template/access_log?pretty' -d@/etc/elasticsearch/scripts/access_log.template.json
{
  "acknowledged" : true
}

$ curl -s 'localhost:9200/_template/access_log?pretty' | egrep -A2 "bytes|clientIP|status"
          "clientIP" : {
            "index" : "not_analyzed",
            "type" : "ip"
--
          "bytes" : {
            "index" : "not_analyzed",
            "type" : "long"
--
          "status" : {
            "index" : "not_analyzed",
            "type" : "long"

I edit and check the config.

$ vi /etc/logstash/conf.d/kafka/config.json
$ /opt/logstash/bin/logstash -t -f /etc/logstash/conf.d/kafka/config.json
Configuration OK

Here is the grok for the message. I have switched from using IPORHOST to DATA since our security scans are putting garbage in the first to fields.

"message" => "%{DATA:serverName} %{DATA:clientIP} %{QS:xForwardedFor} (?:%{USER:ident}|-) (?:%{USER:auth}|-) \[%{HTTPDATE:timeStamp}\] \"(?:%{WORD:method} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpVersion})?|%{DATA:rawRequest})\" %{NUMBER:status} (?:%{NUMBER:bytes}|-) (?:%{NUMBER:duration}|-) (?:%{DATA:browserTrackingId}|-) %{QS:referrer} %{QS:agent}"

I convert the numbers to integer.

mutate {
  convert => { 
    "bytes" => "integer"
    "duration" => "integer"
    "status" => "integer"
  }
}

I output the event to Elastic.

elasticsearch {
  hosts => [...]
  manage_template => false
  index => "access_log-%{+YYYY.MM.dd}"
  document_type => "access_log"
}

When I add start LogStash and the index back into Kafka I see the bytes and status as string on the Kafka settings page and have a t next to them on the discover page.

Here is the template with most of the fields removed for brevity. I import multiple templates and the section from default to properties is the same for all of them. Not sure if that is an issue.

{
  "template": "access_log-*",
  "order" : 0,
  "mappings": {
    "_default_": {
      "_all": {
        "enabled": false
      },
      "dynamic_templates": [
        {
          "template1": {
            "mapping": { 
              "doc_values":   true,
              "ignore_above": 1024,
              "index":        "not_analyzed",
              "type":         "{dynamic_type}"
            },
            "match": "*"
          }
        }
      ],
      "properties": {
        "@timestamp":               { "type": "date", "format": "strict_date_optional_time||epoch_millis" },
        "@version":                 { "type": "string" },
        ...
        "bytes":                    { "type": "long",   "index": "not_analyzed" },
        "clientIP":                 { "type": "ip",     "index": "not_analyzed" },
        "count":                    { "type": "long" },
        "duration":                 { "type": "long",   "index": "not_analyzed" },
        ...
        "status":                   { "type": "long",   "index": "not_analyzed" },
        ...
      }
    }
  }
}

I would appreciate any help,
Wes.

What's the mapping (not template) look like after you ingest the data?

Sorry for not getting back to you earlier.

I am not too sure what you mean by mapping but when I look at Indicies under Settings I see the type for status is String. When I look in Discover I see status of 200 with a 't' next to it.

I exported the template again and it is still long.

# curl -s 'localhost:9200/_template/access_log?pretty' | egrep -A2 "bytes|clientIP|status"
          "clientIP" : {
            "index" : "not_analyzed",
            "type" : "ip"
--
          "bytes" : {
            "index" : "not_analyzed",
            "type" : "long"
--
          "status" : {
            "index" : "not_analyzed",
            "type" : "long"

I started using the Ruby filter more when I ran into issues with field names while trying to use the kv filter on the query string and parameter names had illegal characters. I am now using Ruby to convert strings to numbers.

@number_arr = ['bytes', 'duration', 'status'];
@number_re = Regexp.new('^[+-]?[0-9]+$');
@number_arr.each { |field| event[field] = event[field].to_i if @number_re.match(event[field]) };

Wes.