Wrong mapping, need to change it \ lost in how to and documentation

Hello,
sorry to bother you all. and sorry for any "lack of information" on my sides..

I'm running ELK stalk to index logs from syslog Fortinet Analyzer, all of my "fields" are currentely STRING except a few exception ...

heres my config file :

`"10-network_log.conf" 36L, 513C 26,3 All
input {
file {
path => ["/var/log/network.log"]
start_position => "beginning"
type => "syslog"
}
}

filter{

grok {
match => [
"message",
"%{TIMESTAMP_ISO8601:logtimestamp} %{GREEDYDATA:kv}"
]
remove_field => ["message"]
}

kv {
source => "kv"
field_split => " "
value_split => "="
}

date {
match => ["logtimestamp", "ISO8601"]
locale => "en"
remove_field => ["logtimestamp"]
}

geoip{
source =>"dstip"
database =>"/opt/logstash/GeoLiteCity.dat"
}

}`

I need to change it so I can do a "scripted field" later to "see actual bandwith usage" and might as well do other change later... using this : doc['rcvdbyte'].value + doc['sentbyte'].value

Im running latest version of the ELK stack.
Kibana 4.4.1 for all that matters.

Please help, I dont want to mess all my stuff, and to be honest.. Documentation is awsome! BUT I would like to have something ready to eat.. I learn way faster that way .. plus its sort of a production environnement ( actualy its an IMPORTANT POC ) ...

Actual indices :

actual mapping exemple :

wget http://localhost:9200/_mapping?pretty=1; cat _mapping?pretty=1 |grep -A10 -B10 rcvdbyte
wget http://localhost:9200/_mapping?pretty=1; cat _mapping?pretty=1 |grep -A10 -B10 sentbyte

"rcvdbyte" : {
"type" : "string",
"norms" : {
"enabled" : false
},
"fielddata" : {
"format" : "disabled"
},
"fields" : {
"raw" : {
"type" : "string",

          "format" : "disabled"

same goes for sentbyte

Hi,

Before indexing your logs in Elasticsearch, make sure to create mapping for your index. By default, Elasticsearch considers everything as string.

See this:

https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-put-mapping.html

In your case, if the name of the index is "test", the mapping should be:

PUT http://localhost:9200/test/syslog
{
"properties": {
"rcvdbyte": {
"type": "double"
},
"sentbyte": {
"type": "double"
}
// and so on for rest of numeric fields
}
}

For string types, check for tokenizer too.

thank you for the reply

that is where im somehow lost... i currentely already have a few indices, name is "logstash-DATEOFTHEDAY"
like that :

yellow open logstash-2016.07.21 5 1 3590233 0 3.9gb 3.9gb
yellow open logstash-2016.07.20 5 1 4574644 0 4.6gb 4.6gb
yellow open .kibana 1 1 12 104 170.4kb 170.4kb
yellow open logstash-2016.07.19 5 1 4573488 0 4.7gb 4.7gb
yellow open logstash-2016.07.16 5 1 2690013 0 2.6gb 2.6gb
yellow open logstash-2016.07.15 5 1 4803392 0 4.6gb 4.6gb
yellow open logstash-2016.07.18 5 1 4723400 0 4.7gb 4.7gb
yellow open logstash-2016.07.17 5 1 2845620 0 2.5gb 2.5gb
yellow open logstash-2016.07.14 5 1 6467284 0 5.8gb 5.8gb
yellow open logstash-2016.07.13 5 1 3160746 0 2.9gb 2.9gb

in a perfect world, I can't lose those indices BUT I need to "mutate" the rcvdbyte sentbyte and probably dstip srcip and such to fit my needs, all that without losing data.

I have read that : Changing Mapping with Zero Downtime | Elastic Blog
pretty much what I need, but... As I said in my OP, for a reason I can't figure yet... I get lost in all those documentation.

So ... if I get it correctly, the correct and fastest way to do what I need to do it.. is to modify my current config file, adding mutate convert my field to what I need em to be...

wipe the index actually existing cause I wont be able to use em anyway to generate visualization or whatever else .. THEN restart over ?

input {
file {
path => ["/var/log/network.log"]
start_position => "beginning"
type => "syslog"
}
}

filter{

grok {
match => [
"message",
"%{TIMESTAMP_ISO8601:logtimestamp} %{GREEDYDATA:kv}"
]
remove_field => ["message"]
}

kv {
source => "kv"
field_split => " "
value_split => "="
}

date {
match => ["logtimestamp", "ISO8601"]
locale => "en"
remove_field => ["logtimestamp"]
}

geoip{
source =>"dstip"
source =>"srcip"
database =>"/opt/logstash/GeoLiteCity.dat"
}

mutate {
convert => ["rcvdbyte", "integer"]
convert => ["sentbyte", "integer"]
}

}

does it seem, correct ? what other convert should I do, anything on the geo ? need to be float or something ? srcip convert to IP ?

thank you .. I might just, figure out my correc config, then ... restart all over "again"

Logstash Mutate will be effective only for data yet to be indexed, not the existing ones. And event after that, ES mapping is required, All you have to do is -

  1. Create aliases for your existing indexes

  2. Create new index with correct mapping
    "rcvdbyte": {
    "type": "double"
    },
    "sentbyte": {
    "type": "double"
    }
    and for IP address too

  3. Reindex your data in new index
    https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html

  4. Update alias settings and delete old index

If this is still too much, I can post an example which is using the same field as yours :slight_smile:

Thank you for the hand!

unfortunately , its where im actually bugged...
problem with the data I ingest is that somehow it create an automatic mapping ( all using string ) I think, its called "dynamic mapping" ? BUT it also create "Type" by it self. ( probably created while doing KV filter and Grok.. Im not sure )

when I tried to create my own mapping, it was a failure, hence why im asking for a "pre-made" concept.

I'm deeply sorry for being such a pain in the a** hehe, I index Fortigate Log using "syslog format" as my config earlier shown. and as you saw earlier too, it get all indexed in daily logstash-YYYY.MM.DD generated index.

So IF I understand correctely, I would need to figure out how to create my mapping correctely ( only for the specific field I need, or all of em ? ) Then, change my output config so it get push to the right index ?

im sorry really, im lost. I mean I think I understand the whole concept and logic, but unfortunately Lucene and such is a complete different world for me... btw, no body ever created a "console menu" or "web interface" to manage those damn mapping !?

haha , thank's a lot im waiting for your reply. feel free to let me know if ANYTHING is missing for you to be able to help me ( and sorry for my english )

I totally understand. Below is an example which I tried. By default, elasticsearch will index everything as "string" unless specified. Try this, before going in deep with analyzers etc.

Step-1 Logstash
input {
file {
path => "/home/ubuntu/Logstash_input/access1.log" Path of the input file
type => "apache"  set sourcetype
start_position => beginning  followtail=0

}
}

filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" } Built-in type for apache
#pattern => ["%{COMBINEDAPACHELOG}"]
}
date
{
locale => "en"
match => ["timestamp", "dd/MMM/YYYY:HH:mm:ss Z", "ISO8601"] Identifying timestamp from data, else current time is set
timezone => "Asia/Kolkata"
target => "@timestamp"
}
geoip { Setting geo coordinates
source => "[clientip]"  Source field
target => "geoip"  Target field name
}
}
output {
elasticsearch { Output to ES
hosts => ["172.30.0.73:9200"]
index => "apache_access" Index name
}
stdout { codec => rubydebug } Also print on sysout
}

Step-2 Elasticsearch Mapping

curl -XPOST 'http://localhost:9200/apache_access -d '{

"settings" : {
    "number_of_shards" : 1
},
"mappings" : {
    "apache" : {
        
        "properties" : {

"bytes": {"type": "long"},
"response":{ "type":"long"},
"clientip":{ "type": "ip"}
}
}
}
}’

Sample Log
182.236.164.11 - - [10/Apr/2015:18:20:50 +0530] "GET /category.screen?categoryId=STRATEGY&JSESSIONID=SD6SL8FF10ADFF53101 HTTP 1.1" 200 1200 "http://www.google.com" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.46 Safari/536.5" 490
182.236.164.11 - - [10/Apr/2015:18:20:52 +0530] "GET /product.screen?productId=MB-AG-G07&JSESSIONID=SD6SL8FF10ADFF53101 HTTP 1.1" 200 1035 "http://www.buttercupgames.com/category.screen?categoryId=ARCADE" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.46 Safari/536.5" 461
182.236.164.11 - - [10/Apr/2015:18:20:53 +0530] "POST /cart.do?action=addtocart&itemId=EST-6&productId=MB-AG-G07&JSESSIONID=SD6SL8FF10ADFF53101 HTTP 1.1" 200 533 "http://www.buttercupgames.com/product.screen?productId=MB-AG-G07" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.46 Safari/536.5" 470

I would suggest - Try a simple example first, then it is easy to do for other complex grok patterns.

Hope this helps!

there, right THERE

type = it NEVER worked. thats where Ive figured that, it must of set it OWN type while indexing...

how would someone run an "alternative logstash config file" to test without losing anything ?
Tried that : # /opt/logstash/bin/logstash -f /etc/logstash/test_mapping/10-fortigate-maptest.conf

it never created the index. so I can only assume that for the 5 minutes I let it ran, it simply indexed in double my actual data ?

10-fortigate-maptest.conf :

input {
file {
path => ["/var/log/network.log"]
start_position => "beginning"
type => "syslog"
}
}

filter{

grok {
match => [
"message",
"%{TIMESTAMP_ISO8601:logtimestamp} %{GREEDYDATA:kv}"
]
remove_field => ["message"]
}

kv {
source => "kv"
field_split => " "
value_split => "="
}

date {
match => ["logtimestamp", "ISO8601"]
locale => "en"
remove_field => ["logtimestamp"]
}

mutate {
convert => ["rcvdbyte", "integer"]
convert => ["sentbyte", "integer"]
convert => ["bandwidth", "integer"]
}

geoip{
source =>"dstip"
database =>"/opt/logstash/GeoLiteCity.dat"
}

}

50-output.conf

output {
elasticsearch {
hosts => ["localhost:9200"]
index => "ftg-TEST-%{+YYYY.MM.dd}"
}

stdout { codec => rubydebug }

}

Thanks again for your time, greatly apreciated.

The type you mention in Logstash config file has to be same as the one you mention in ES mapping:

input {
file {
path => ["/var/log/network.log"]
start_position => "beginning"
> type => "syslog"
}
}

ES:

curl -XPOST 'http://localhost:9200/ftg-TESTI AM NOT SURE -d '{

"settings" : {
"number_of_shards" : 1
},
"mappings" : {
> "syslog" : {

    "properties" : {

"rcvdbyte": {"type": "integer"},
"sentbyte":{ "type":"integer"},
"bandwidth ":{ "type": "integer"}
}
}
}
}’

lemme show you the mapping in place actually for "one index"

http://pastebin.com/NLWGpQCc ( I couldnt upload the txt file so... )
Maybe it will help visualise how ES proceed automatically my old config file ?

im not sure if it make sense ?

I mean, my mapping look completely different from your suggestion sample.
in term of " structure" and "options?" I guess..

Once again, thank you.

Btw. im willing, if needed to scrap all of my current index, If I can get it to work correctely playing with my logstash config.

if ever it helps.