Logstash 7.10 is ignoring custom index template

Hi,

This is a fresh installation of ELK 7.10.0. I am trying to configure Logstash to process .csv files and index into ES. I created a config file and a custom index template. It looks like the .csv are being processed, but with default auto-mapping or other mapping template but the one I need to use. All fields are being indexed as text fields, including date and IP fields, and that's not the way I defined those in my custom template. No errors were found in the logs and apparently the config file and the custom template are being processed correctly, then what is failing? I have spent a lot of time trying to find what I could be missing or doing wrong without luck so far.

Can somebody please help me to fix this?

ts_reports.conf

input {
    file {
        id => "TS_Reports"
        path => "/opt/ts_reports/*.csv"
        mode => "read"
        start_position => "beginning"
        file_completed_action => "delete"
        type => "TS"
    }
}
filter {
    csv {
        columns => [
                "Time",
                "Device",
                "Source IP",
                "Source Port",
                "Destination IP",
                "Destination Port",
                "Action",
                "Direction",
                "Targets",
                "ID"
        ]
        separator => ","
        }
}
output {
     elasticsearch {
#        action => "index"
        hosts => ["https://127.0.0.1:9200"]
        index => "ts_reports-%{+YYYY.MM}"
        manage_template => true
        template => "/etc/logstash/ts_reports-template.json"
        user => "[username]"
        password => "[password]"
        ssl => true
        ssl_certificate_verification => true
        cacert => "[cert]"
    }
}

ts_reports-template.json

{
  "index_patterns": "ts_reports-*",
  "settings": {
    "index.number_of_replicas": 0,
    "index.refresh_interval" : "5s"
  },
  "mappings": {
    "properties": {
      "@timestamp": {
        "type": "date"
      },
      "Action": {
        "type": "keyword"
      },
      "Destination IP": {
        "type": "ip"
      },
      "Destination Port": {
        "type": "long"
      },
      "Device": {
        "type": "keyword"
      },
      "Direction": {
        "type": "keyword"
      },
      "ID": {
        "type": "keyword"
      },
      "Source IP": {
        "type": "ip"
      },
      "Source Port": {
        "type": "long"
      },
      "Targets": {
        "type": "text"
      },
      "Time": {
        "type": "date",
        "format": "MM/dd/yyyy HH:mm"
      }
    }
  }
}

GET ts_reports-2020.12/_mapping

{
  "ts_reports-2020.12" : {
    "mappings" : {
      "properties" : {
        "@timestamp" : {
          "type" : "date"
        },
        "@version" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "Action" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "Destination IP" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "Destination Port" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "Device" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "Direction" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "ID" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "Source IP" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "Source Port" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "Targets" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "Time" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "host" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "message" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "path" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "type" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

I think we need to see if the problem is getting the template loaded vs. getting it applied to the indes.

Check Elasticsearch and see if template is loaded. Check the logstash logs at startup and see if the template is checked for any updates. You might make a trivial change to ensure the template tries to update. I don't see you setting a template name, so it may be defaulting. If anything else is trying to update the default template, you may be getting one overwritten. Maybe give it a name so we know it's getting loaded (or not).

See compatibility note.

If you are using a custom template, ensure your template uses the _doc document-type before connecting to Elasticsearch 7.x.

I don't know if that's your issue or not and it seems odd you need to add _doc since we don't use that anymore. It's worth a shot.

Hi @aaron-nimocks,

Thank you for your interest on the issue. I added the "_doc" to my custom mapping, but nothing changed.

Steps:

  • Removed all related indexes
  • Removed index pattern
  • Removed file since_db
  • Updated the template
  • Restarted Logstash
  • Checked index and mapping, where I could see all fields as text

Please check my mapping template below and let me know if I did it right.

{
  "index_patterns" : ["ts_reports-*"],
  "template": {
    "settings": {
      "index.number_of_replicas": 0,
      "index.refresh_interval" : "5s"
    },
    "mappings": {
      "_doc": {
        "properties": {
          "@timestamp": {
            "type": "date"
          },
          "Action": {
            "type": "keyword"
          },
          "Destination IP": {
            "type": "ip"
          },
          "Destination Port": {
            "type": "long"
          },
          "Device": {
            "type": "keyword"
          },
          "Direction": {
            "type": "keyword"
          },
          "ID": {
            "type": "keyword"
          },
          "Source IP": {
            "type": "ip"
          },
          "Source Port": {
            "type": "long"
          },
          "Targets": {
            "type": "keyword"
          },
          "Time": {
            "type": "date",
            "format": "yyyy-MM-dd HH:mm:ss"
          }
        }
      }
    }
  }
}

Hi @rugenl,

Thank you for your help. I was unable to find the template in ES. Perhaps it's not being created after all or I am not able to find it. I tried GET _cat/templates from the Dev console.

After checking Logstash startup logs, I think Logstash is properly handling the config file and the template file, plus I don't see any errors. Could it be that Logstash is not the root cause, but ES? Please check the log below:

 [2020-12-14T15:52:04,155][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"7.10.0", "jruby.version"=>"jruby 9.2.13.0 (2.5.7) 2020-08-03 9a89c94bcc OpenJDK 64-Bit Server VM 11.0.8+10 on 11.0.8+10 +indy +jit [linux-x86_64]"}
[2020-12-14T15:52:07,153][INFO ][logstash.monitoring.internalpipelinesource] Monitoring License OK
[2020-12-14T15:52:07,164][INFO ][logstash.monitoring.internalpipelinesource] Validated license for monitoring. Enabling monitoring pipeline.
[2020-12-14T15:52:09,271][INFO ][org.reflections.Reflections] Reflections took 49 ms to scan 1 urls, producing 23 keys and 47 values
[2020-12-14T15:52:09,991][INFO ][logstash.outputs.elasticsearch][ts_reports] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[https://logstash_internal:xxxxxx@127.0.0.1:9200/]}}
[2020-12-14T15:52:10,034][INFO ][logstash.outputs.elasticsearchmonitoring][.monitoring-logstash] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[https://logstash_system:xxxxxx@127.0.0.1:9200/]}}
[2020-12-14T15:52:10,103][WARN ][logstash.outputs.elasticsearchmonitoring][.monitoring-logstash] Restored connection to ES instance {:url=>"https://logstash_system:xxxxxx@127.0.0.1:9200/"}
[2020-12-14T15:52:10,112][INFO ][logstash.outputs.elasticsearchmonitoring][.monitoring-logstash] ES Output version determined {:es_version=>7}
[2020-12-14T15:52:10,121][WARN ][logstash.outputs.elasticsearchmonitoring][.monitoring-logstash] Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>7}
[2020-12-14T15:52:10,219][WARN ][logstash.outputs.elasticsearch][ts_reports] Restored connection to ES instance {:url=>"https://logstash_internal:xxxxxx@127.0.0.1:9200/"}
[2020-12-14T15:52:10,225][INFO ][logstash.outputs.elasticsearch][ts_reports] ES Output version determined {:es_version=>7}
[2020-12-14T15:52:10,227][WARN ][logstash.outputs.elasticsearch][ts_reports] Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>7}
[2020-12-14T15:52:10,269][INFO ][logstash.outputs.elasticsearchmonitoring][.monitoring-logstash] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearchMonitoring", :hosts=>["https://127.0.0.1:9200"]}
[2020-12-14T15:52:10,343][INFO ][logstash.outputs.elasticsearch][ts_reports] Using mapping template from {:path=>"/etc/logstash/ts_reports-template.json"}
[2020-12-14T15:52:10,288][INFO ][logstash.outputs.elasticsearch][ts_reports] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["https://127.0.0.1:9200"]}
[2020-12-14T15:52:10,391][WARN ][logstash.javapipeline    ][.monitoring-logstash] 'pipeline.ordered' is enabled and is likely less efficient, consider disabling if preserving event order is not necessary
[2020-12-14T15:52:10,418][INFO ][logstash.outputs.elasticsearch][ts_reports] Attempting to install template {:manage_template=>{"index_patterns"=>["ts_reports-*"], "template"=>{"settings"=>{"index.number_of_replicas"=>0, "index.refresh_interval"=>"5s"}, "mappings"=>{"_doc"=>{"properties"=>{"@timestamp"=>{"type"=>"date"}, "Action"=>{"type"=>"keyword"}, "Destination IP"=>{"type"=>"ip"}, "Destination Port"=>{"type"=>"long"}, "Device"=>{"type"=>"keyword"}, "Direction"=>{"type"=>"keyword"}, "ID"=>{"type"=>"keyword"}, "Source IP"=>{"type"=>"ip"}, "Source Port"=>{"type"=>"long"}, "Targets"=>{"type"=>"keyword"}, "Time"=>{"type"=>"date", "format"=>"yyyy-MM-dd HH:mm:ss"}}}}}}}
[2020-12-14T15:52:10,524][INFO ][logstash.javapipeline    ][ts_reports] Starting pipeline {:pipeline_id=>"ts_reports", "pipeline.workers"=>4, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>500, "pipeline.sources"=>["/etc/logstash/conf.d/ts_reports/ts_reports.conf"], :thread=>"#<Thread:0x487c4a40@/usr/share/logstash/logstash-core/lib/logstash/pipelines_registry.rb:141 run>"}
[2020-12-14T15:52:10,535][INFO ][logstash.javapipeline    ][.monitoring-logstash] Starting pipeline {:pipeline_id=>".monitoring-logstash", "pipeline.workers"=>1, "pipeline.batch.size"=>2, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>2, "pipeline.sources"=>["monitoring pipeline"], :thread=>"#<Thread:0x1d868d40@/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:125 run>"}
[2020-12-14T15:52:23,786][INFO ][logstash.javapipeline    ][.monitoring-logstash] Pipeline Java execution initialization time {"seconds"=>13.23}
[2020-12-14T15:52:23,841][INFO ][logstash.javapipeline    ][ts_reports] Pipeline Java execution initialization time {"seconds"=>13.3}
[2020-12-14T15:52:23,937][INFO ][logstash.javapipeline    ][.monitoring-logstash] Pipeline started {"pipeline.id"=>".monitoring-logstash"}
[2020-12-14T15:52:24,453][INFO ][logstash.inputs.file     ][ts_reports] No sincedb_path set, generating one based on the "path" setting {:sincedb_path=>"/var/lib/logstash/plugins/inputs/file/.sincedb_3ceba979d686167455526811087aac6f", :path=>["/opt/ts_reports/*.csv"]}
[2020-12-14T15:52:24,494][INFO ][logstash.javapipeline    ][ts_reports] Pipeline started {"pipeline.id"=>"ts_reports"}
[2020-12-14T15:52:24,610][INFO ][filewatch.observingread  ][ts_reports][TS_Reports] START, creating Discoverer, Watch with file and sincedb collections
[2020-12-14T15:52:25,281][INFO ][logstash.agent           ] Pipelines running {:count=>3, :running_pipelines=>[:ts_reports, :".monitoring-logstash", :main], :non_running_pipelines=>[]}
[2020-12-14T15:52:27,221][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}

Logstash attempts to store the template:

Attempting to install template {:manage_template=>{"index_patterns"=>["ts_reports-*"], "template"=>{"settings"

But it doesn't give either a success for fail message after that. Check the elasticsearch logs at the same timestamp for errors.

You could try loading the template via the api, at least for debugging this issue.

Remember, the template only applies when the index is created, so in your case, the first data of the day, or delete the index and reload it.

Unrelated, but my gripe about the _doc documentation, it's all very confusing. The link above DOES NOT use _doc, it uses _source. ES's own doc isn't helping clear up the confusion.

2 Likes

Hi @rugenl,

I added the index template manually via Dev console:

PUT _index_template/ts_reports
{
  "index_patterns" : ["ts_reports-*"],
  "template": {
    "settings": {
      "index.number_of_replicas": 0,
      "index.refresh_interval" : "5s"
    },
    "mappings": {
        "properties": {
           "@timestamp": {
             "type": "date"
            },
            "Action": {
              "type": "keyword"
            },
            "Destination IP": {
              "type": "ip"
            },
            "Destination Port": {
              "type": "long"
            },
            "Device": {
              "type": "keyword"
            },
            "Direction": {
              "type": "keyword"
            },
            "ID": {
              "type": "keyword"
            },
            "Source IP": {
              "type": "ip"
            },
            "Source Port": {
              "type": "long"
            },
            "Targets": {
              "type": "keyword"
            },
            "Time": {
              "type": "date",
              "format": "yyyy-MM-dd HH:mm:ss"
          }
        }
    }
  }
} 

I am still using the same config settings and same template in Logstash to process the csv files. When processing files I am getting this error:

[2020-12-15T09:18:58,992][WARN ][logstash.outputs.elasticsearch][ts_reports][2a9645a1283f5487a5f77a96692bfc51b88f0359e43d3c87a1613887b423df62] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"ts_reports-2020.12", :routing=>nil, :_type=>"_doc"}, #<LogStash::Event:0x1dbb7635>], :response=>{"index"=>{"_index"=>"ts_reports-2020.12", "_type"=>"_doc", "_id"=>"G93EZnYBVuAl_SuTtA_X", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse field [Destination IP] of type [ip] in document with id 'G93EZnYBVuAl_SuTtA_X'. Preview of field's value: 'Destination IP'", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"'Destination IP' is not an IP string literal."}}}}}
[2020-12-15T09:19:06,558][WARN ][logstash.outputs.elasticsearch][ts_reports][2a9645a1283f5487a5f77a96692bfc51b88f0359e43d3c87a1613887b423df62] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"ts_reports-2020.12", :routing=>nil, :_type=>"_doc"}, #<LogStash::Event:0xeda3f9f>], :response=>{"index"=>{"_index"=>"ts_reports-2020.12", "_type"=>"_doc", "_id"=>"QN3EZnYBVuAl_SuT2yhP", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse field [Destination IP] of type [ip] in document with id 'QN3EZnYBVuAl_SuT2yhP'. Preview of field's value: 'Destination IP'", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"'Destination IP' is not an IP string literal."}}}}}

Does every single record have an IP and what's the format look like?

I can see this happening if an empty field or something not matching to an IP is passed.

Is it trying to process the csv header line? Take it out or add code to drop it.

Hi @aaron-nimocks,

Every single record has "Time","Device","Source IP","Source Port","Destination IP","Destination Port","Action","Direction","Targets","ID". There is no empty field, although some machines may have both IPv4 and IPv6 or even more than one IPv4 address.

Also I realized that ES is adding a second index template (legacy) besides the index template that I manually added via Dev console (placed as not legacy). If I added an index template manually, should I remove the following lines from the pipeline config file in Logstash?:

    manage_template => true
    template => "/etc/logstash/ts_reports-template.json"
    template_name => "ts_reports"

This is the second template created dynamically:

{
  "_doc": {
    "dynamic_templates": [
      {
        "message_field": {
          "path_match": "message",
          "mapping": {
            "norms": false,
            "type": "text"
          },
          "match_mapping_type": "string"
        }
      },
      {
        "string_fields": {
          "mapping": {
            "norms": false,
            "type": "text",
            "fields": {
              "keyword": {
                "ignore_above": 256,
                "type": "keyword"
              }
            }
          },
          "match_mapping_type": "string",
          "match": "*"
        }
      }
    ],
    "properties": {
      "@timestamp": {
        "type": "date"
      },
      "geoip": {
        "dynamic": true,
        "properties": {
          "ip": {
            "type": "ip"
          },
          "latitude": {
            "type": "half_float"
          },
          "location": {
            "type": "geo_point"
          },
          "longitude": {
            "type": "half_float"
          }
        }
      },
      "@version": {
        "type": "keyword"
      }
    }
  }
}

This could be an issue if the data is not just 1 IP address.

Yes.

Should I then treat IP fields as string/keyword instead?

I think that's your only choice unless you clean up the data first.

I checked really quick all the entries in the two logs I am using for this testing, and I found:

  • None of the entries have more than one Source IP or Destination IP address
  • All IP entries are IPv4 addresses

Also I disabled the template related lines in the config file for Logstash. Additional index template was not created this time, but I still got parsing errors:

[2020-12-15T10:23:58,091][WARN ][logstash.outputs.elasticsearch][ts_reports][29a4990c7a28f3228f10ecc483bdc759187d378f5b9b9c127430a1a89bfb60fa] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"ts_reports-2020.12", :routing=>nil, :_type=>"_doc"}, #<LogStash::Event:0x6b60aa9a>], :response=>{"index"=>{"_index"=>"ts_reports-2020.12", "_type"=>"_doc", "_id"=>"teIAZ3YBVuAl_SuTBnpu", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse field [Source Port] of type [long] in document with id 'teIAZ3YBVuAl_SuTBnpu'. Preview of field's value: 'Source Port'", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"For input string: \"Source Port\""}}}}}
[2020-12-15T10:24:07,923][WARN ][logstash.outputs.elasticsearch][ts_reports][29a4990c7a28f3228f10ecc483bdc759187d378f5b9b9c127430a1a89bfb60fa] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"ts_reports-2020.12", :routing=>nil, :_type=>"_doc"}, #<LogStash::Event:0x2e5aedc3>], :response=>{"index"=>{"_index"=>"ts_reports-2020.12", "_type"=>"_doc", "_id"=>"meIAZ3YBVuAl_SuTZJjj", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse field [Source Port] of type [long] in document with id 'meIAZ3YBVuAl_SuTZJjj'. Preview of field's value: 'Source Port'", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"For input string: \"Source Port\""}}}}}

This is one line in the csv file:

"2020-11-10 00:00:25","AXTR","192.168.0.93","56971","100.75.228.42","57181","block","out","BS - IPs","AXWvj6zfh7B56uHIaWKr"

Also system is adding extra fields after processing the files

Did you check @rugenl suggestion about the CSV header? If it tries to process the header it will fail also.

No, I missed that comment from @rugenl . I'll try that next. If that works, I'll need to add code to remove the header, since all files comes with it.

Another comment in the meantime. I tried adding one of the csv file via Machine Learning/Import Data. There is an option to specify if the csv files includes headers.

Pros:
The index created looks great. The index pattern has only 16 fields instead of 26 using previous index template. The 6 extra fields looks like system fields that I don't mind having: @timestamp, _index, _score, _source, _type, _id (although each event already includes the field ID as you can see in the sample event line above).

I don't have use for fields _score and _type, so I could remove those, but this does not represent a big deal right now.

Cons:

  • No index template is created
  • Extra field @timestamp shows the right time of the event, but original field Time is showing a very different date and time

I would leave the default fields as is.

The @timestamp field is most likely the time Logstash processed it. Typically people use that as the time of the event so they would copy Time to @timestamp. In your index pattern I would make sure to choose Time field to be used for time series data.

Ironically in this case it is the @timestamp field that is correct. At first I also thought it was the date when the log was processed by Logstash, but it is not. The log is being processed today 2020-12-15, but the @timestamp date is 2020-11-10, which is also reflected in the line corresponding to that event, which I located by its ID field. It is exactly the date/time that @timestamp has. I don't know when Time takes a very different value on this automated process by Machine Learning:

csv corresponding line
"2020-11-10 01:54:15","GVB-467","10.1.10.200","53219","103.224.212.221","443","block","out","WG - IPs","AXWv74QzILWXbO4M30P6"

Do I need to convert .csv to .json before logstash can process the file and index it? I though csv plugins were designed to do that for me.

Do I need to use CSV codec plugin + CSV filter plugin, or just one of them in Logstash? Which one would be the best option (or easy)?

I am having a hard time finding the right combination/config to process a very simple csv file with just 10 fields. Perhaps this is a very simple task for an advanced user with a lot of experience in ELK, but for me it's becoming a challenge.

Is my config file wrong? Is my template file wrong? Am I missing additional configs?

Please help me to see the light :sweat:

Thanks

You need to have a CSV filter.

Here is where you set your header names, separator, ignore header settings, etc.

This filter takes those parameters and then transforms it into JSON for you.