How to let logstash splits event field values and assign it to @metadata field

Hi, I have a logstash event, which has the following field,

{
  "_index": "logstash-2016.08.09",
  "_type": "log",
  "_id": "AVZvz2ix",
  "_score": null,
  "_source": {
    "message": "function_name~execute||line_no~128||debug_message~id was not found",
    "@version": "1",
    "@timestamp": "2016-08-09T14:57:00.147Z",
    "beat": {
      "hostname": "coredev",
      "name": "coredev"
    },
    "count": 1,
    "fields": null,
    "input_type": "log",
    "offset": 22299196,
    "source": "/project_root/project_1/log/core.log",
    "type": "log",
    "host": "coredev",
    "tags": [
      "beats_input_codec_plain_applied"
    ]
  },
  "fields": {
    "@timestamp": [
      1470754620147
    ]
  },
  "sort": [
    1470754620147
  ]
}

I am wondering how to use filter (kv maybe?) to extract core.log from "source": "/project_root/project_1/log/core.log", and put it in e.g. [@metadata][log_type], and so later on, I can use log_type in output to create an unique index, composing of hostname + logtype + timestamp, e.g.

output {
  elasticsearch {
    hosts => "localhost:9200"
    manage_template => false
    index => "%{[@metadata][_source][host]}-%{[@metadata][log_type]}-%{+YYYY.MM.dd}"
    document_type => "%{[@metadata][type]}"
  }
  stdout { codec => rubydebug }
}

Use a grok filter.

grok {
  match => ["source", "/(?<[@metadata][log_type]>[^/]+)$"]
}

... to create an unique index, composing of hostname + logtype + timestamp ...

Why so many indexes? You are aware of the constant per-shard overhead?

Hi, yes, I know that there is a fixed overhead for using each index, and can be resource-intensive on aggregations. The logs harvested are coming from different modules of the whole system, so they do possess a different meanings in terms of the fields.

Hi, I am wondering whats the syntax behind

"/(?<[@metadata][log_type]>[^/]+)$"

grok match seems to use => between field and value?

You can use

match => { "source" => "/(?<[@metadata][log_type]>[^/]+)$" }

if you prefer. They're equivalent.

Is it a regex for "/(?<[@metadata][log_type]>[^/]+)$"? How does it work?

/(?<[@metadata][log_type]>[^/]+)$ is a regexp, yes. After a slash it captures one or more characters that are not slashes ([^/]+) up to the end of the line and stores them in the [@metadata][log_type] field.

thx for the explanation, where can I find the reference to the syntax of the expression, since it looks a bit different from the regex. For the store value part ?<> particularly.

No, that's pretty standard stuff (for modern regexps at least).

http://www.regular-expressions.info/named.html

Hi, I tried the script for extracting log type,

filter {

  grok {
    match => ["source", "/(?<[@metadata][log_type]>[^/]+)$"]
  }

  kv { value_split => "~"
       field_split => "||" }

  date {
    locale => "en"
    match => [ "timestamp", "yyyy-MM-dd HH:mm:ss,SSS" ]
  }
}

output {
    elasticsearch {
      hosts => "172.17.0.2:9200"
      manage_template => false
      index => "%{host}-%{[@metadata][log_type]}-%{+YYYY.MM.dd}"
      document_type => "%{[@metadata][log_type]}"
    }
    stdout { codec => rubydebug }
}

it generates the following errors:

logstash_1       | {:timestamp=>"2016-08-16T10:09:41.352000+0000", :message=>"Pipeline aborted due to error", :exception=>#<RegexpError: invalid char in group name <[@metadata][log_type]>: /\/(?<[@metadata][log_type]>[^\/]+)$/m>, :backtrace=>["org/jruby/RubyRegexp.java:1434:in `initialize'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/jls-grok-0.11.2/lib/grok-pure.rb:127:in `compile'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-grok-2.0.5/lib/logstash/filters/grok.rb:264:in `register'", "org/jruby/RubyArray.java:1613:in `each'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-grok-2.0.5/lib/logstash/filters/grok.rb:259:in `register'", "org/jruby/RubyHash.java:1342:in `each'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-grok-2.0.5/lib/logstash/filters/grok.rb:255:in `register'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.4-java/lib/logstash/pipeline.rb:182:in `start_workers'", "org/jruby/RubyArray.java:1613:in `each'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.4-java/lib/logstash/pipeline.rb:182:in `start_workers'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.4-java/lib/logstash/pipeline.rb:136:in `run'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.4-java/lib/logstash/agent.rb:473:in `start_pipeline'"], :level=>:error}
logstash_1       | {:timestamp=>"2016-08-16T10:09:44.354000+0000", :message=>"stopping pipeline", :id=>"main"}
dockerelk_logstash_1 exited with code 0

How to resolve the issue?

thanks

Okay. It seems you can't have square brackets (or at signs?) in named capture destinations. Pick another field name. You can use a mutate filter to rename it to [@metadata][log_type] later on.

thanks for replying, what do you mean by 'pick another field name'?

cheers

Don't call the field [@metadata][log_type] in the grok expression. Call it something else.