How to handle multiple inputs with Logstash to different indices

Directory Structure:

....Results
....Project1
+....RUN1
+....Run2
....Project2
+....RUN1
+....Run2

"Results" directory contains Project1 & Project2 sub directories. Also there might be more "Project....n" sub dir gets created depending upon test run for several projects.

Each Project DIR contains more that one RUN directories....

I want to process each "Projects" directories as and when they are created and out put them to different indices at elasticsearch.

e.g. For Project1 index to be set as "Logstash-Project1-%{+YYYY.MM.dd}", similarly for Project2 as "Logstash-Project2-%{+YYYY.MM.dd}" and so on.

How do I handle input/ output....

input {
file {
path => "/Results/Project*/RUN*/*.csv"
start_position => "beginning"
}
}

output {
elasticsearch {
action => "index"
host => "localhost"
index => "logstash-%{+YYYY.MM.dd}"

}

}

1 Like

Use the grok filter to extract the project name from the input file path (stored in the path field), then reference that field when setting the index pattern of the elasticsearch output.

elasticsearch {
  ...
  index => "logstash-%{project}-%{+YYYY.MM.dd}"
}

I dont know how to use grok filter to extract the project name from the input file path. What pattern i need to match for extracting Project from path. Any help?

If you want to extract the first directory component below a directory named Results, this is untested but should work:

grok {
  match => ["path", "/Results/(?<project>[^/]+)/"]
}

thank you so much. you saved my day...

Sorry to disturb you again.
But got to know that file path is something like
path => "/opt/Results/*/Run/webobj.csv"

Grok filter which i want to use for extracting the project name is from /*/ of path.

Can you please suggest appropriate match for getting the * value immediate after "Results/*" directory.

Is there any way from where I can generate grok pattern to match.

Your help is much appreciated. Thanks again.

The input file path has nothing to do with the path field. Its value is taken from the name of the actual file from which a particular log message actually came.

Great Help I could do it...

grok {
match => [ "path", "/opt/Log/Results/(?[^/]+)/" ]
}

Thanks

In case of linux it worked so I just thought of to run on windows using
grok {
match => ["path", "C:\Test\Result(?[^/]+)"]
}
This didn't work. any idea why?

Backslashes are metacharacters in regexps so to get match literal backslashes you need to use "\\".

grok {
match => ["path", "C:\Test\Result\(?[^\]+)\"]
}

When i run logstash with above config (As you suggested) I get Error: Expected one of #, {, ,,] at line ....

It looks like you're escaping the closing double quote. This is what you need:

grok {
  match => ["path", "C:\\Test\\Result\\(?[^\]+)\\"]
}

I am using logstash-2.3.2. "type" option is not working

input {
jdbc {
jdbc_driver_library => ..
jdbc_driver_class =>..
jdbc_connection_string =>..
jdbc_user => ..
jdbc_password => ..
statement => ..
type => "deploy"
}
}
output {
if [type]=="deploy"{
elasticsearch { hosts => ["localhost:9200"]
index => "metricslog"
}
}
}

I find the grok match can not enable the "path" and "message" work well together.
like this :
match => {
"message" => "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}"
"path" => "/logfile/(?\w+?)/"
}

the "message" will run fail

I suspect you'll want to split that grok filter in two to make sure both expressions are always evaluated.

@magnusbaeck

Hi Magnus,

First of all, thanks for your previous posts about index. I've tried your suggestion but I don't kown what i did wrong as it does' work for me... (Please inform me if you prefer opening a new topic)

I've 2 log files who stored in the directories /tmp/toto/first/ and /tmp/toto/second/ . I want to have the different index name distinguished by the project name (first and second in this case). Here are my configurations :

Filebeat :

...
  paths:
    - /tmp/toto/*/*.log
... 

Logstash

input {
    beats {
        port => "5043"
    }
}

filter {
    grok {
        match => { "path" => "/tmp/toto/(?<project>[^/]+)/" }
        match => { "message" => "%{COMBINEDAPACHELOG}"}
    }
    date {
    match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
  }

}
output {
    elasticsearch {
        hosts => [ "127.0.0.1:9200" ]
        index => [ "log-%{project}-%{+YYYY.MM.dd}" ]
    }
}

After starting all services, it seems elasticsearch doesn't understand project variable :

health status index                     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   log-%{project}-2016.06.20 QNPYvvFqRzGFEC9_32da7g   5   1        450            0    807.6kb        807.6kb
yellow open   log-%{project}-2017.02.22 RzDgOdWKQXqmItnm0zNdGw   5   1        232            0    138.5kb        138.5kb

Do you have any ideas ?

It looks like the event doesn't have a project field. Check what the events in the log-%{project}-2017.02.22 index look like.

@magnusbaeck

Here is an example of output. I think you're right, i don't have the > project fiield. Do you know how to correct it ?

{
  "_index": "log-%{project}-2017.02.22",
  "_type": "log",
  "_id": "AVqAEzYEO-tgZd20LulR",
  "_score": null,
  "_source": {
    "request": "/",
    "agent": "\"Links (1.03; Linux 2.6.32-642.6.2.el6.x86_64 x86_64; dump)\"",
    "offset": 172883,
    "auth": "-",
    "ident": "-",
    "input_type": "log",
    "verb": "GET",
    "source": "/tmp/toto/first/access_20170222.log",
    "message": "10.124.49.22 - - [22/Feb/2017:23:00:02 +0100] \"GET / HTTP/1.1\" 404 2916 \"-\" \"Links (1.03; Linux 2.6.32-642.6.2.el6.x86_64 x86_64; dump)\" ",
    "type": "log",
    "tags": [
      "beats_input_codec_plain_applied"
    ],
    "referrer": "\"-\"",
    "@timestamp": "2017-02-22T22:00:02.000Z",
    "response": "404",
    "bytes": 2916,
    "clientip": "10.124.49.22",
    "@version": "1",
    "beat": {
      "hostname": "new",
      "name": "new",
      "version": "5.2.1"
    },
    "host": "new",
    "httpversion": "1.1",
    "timestamp": "22/Feb/2017:23:00:02 +0100"
  },
  "fields": {
    "@timestamp": [
      1487800802000
    ]
  },
  "sort": [
    1487800802000 
  ]
}

We've digressed from the original topic. Please start a new thread for your question.

(Hint: Where's the path field you're attempting to parse with grok?)