I am struggling with importing fields into Elastic with the correct format. I am getting numbers stored as string and strings stored as IP addresses and wonder if it is my template. I eventually gave up in getting IP address stored as strings and started dropping the field if it does not match the IPv4 format since Elastic was rejecting the events.
I delete the indicies from Elastic and Kafka settings page.
$ curl -XDELETE 'http://localhost:9200/access_log-*'
{"acknowledged":true}
I import the template. I have switched clientIP back to ip but tried to import it as string.
$ curl -XPUT 'http://localhost:9200/_template/access_log?pretty' -d@/etc/elasticsearch/scripts/access_log.template.json
{
"acknowledged" : true
}
$ curl -s 'localhost:9200/_template/access_log?pretty' | egrep -A2 "bytes|clientIP|status"
"clientIP" : {
"index" : "not_analyzed",
"type" : "ip"
--
"bytes" : {
"index" : "not_analyzed",
"type" : "long"
--
"status" : {
"index" : "not_analyzed",
"type" : "long"
I edit and check the config.
$ vi /etc/logstash/conf.d/kafka/config.json
$ /opt/logstash/bin/logstash -t -f /etc/logstash/conf.d/kafka/config.json
Configuration OK
Here is the grok for the message. I have switched from using IPORHOST to DATA since our security scans are putting garbage in the first to fields.
"message" => "%{DATA:serverName} %{DATA:clientIP} %{QS:xForwardedFor} (?:%{USER:ident}|-) (?:%{USER:auth}|-) \[%{HTTPDATE:timeStamp}\] \"(?:%{WORD:method} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpVersion})?|%{DATA:rawRequest})\" %{NUMBER:status} (?:%{NUMBER:bytes}|-) (?:%{NUMBER:duration}|-) (?:%{DATA:browserTrackingId}|-) %{QS:referrer} %{QS:agent}"
I convert the numbers to integer.
mutate {
convert => {
"bytes" => "integer"
"duration" => "integer"
"status" => "integer"
}
}
I output the event to Elastic.
elasticsearch {
hosts => [...]
manage_template => false
index => "access_log-%{+YYYY.MM.dd}"
document_type => "access_log"
}
When I add start LogStash and the index back into Kafka I see the bytes and status as string on the Kafka settings page and have a t next to them on the discover page.
Here is the template with most of the fields removed for brevity. I import multiple templates and the section from default to properties is the same for all of them. Not sure if that is an issue.
{
"template": "access_log-*",
"order" : 0,
"mappings": {
"_default_": {
"_all": {
"enabled": false
},
"dynamic_templates": [
{
"template1": {
"mapping": {
"doc_values": true,
"ignore_above": 1024,
"index": "not_analyzed",
"type": "{dynamic_type}"
},
"match": "*"
}
}
],
"properties": {
"@timestamp": { "type": "date", "format": "strict_date_optional_time||epoch_millis" },
"@version": { "type": "string" },
...
"bytes": { "type": "long", "index": "not_analyzed" },
"clientIP": { "type": "ip", "index": "not_analyzed" },
"count": { "type": "long" },
"duration": { "type": "long", "index": "not_analyzed" },
...
"status": { "type": "long", "index": "not_analyzed" },
...
}
}
}
}
I would appreciate any help,
Wes.