How to handle multiple inputs with Logstash to different indices

Saket_Kumar · June 29, 2015, 11:21am

Directory Structure:

....Results
....Project1
+....RUN1
+....Run2
....Project2
+....RUN1
+....Run2

"Results" directory contains Project1 & Project2 sub directories. Also there might be more "Project....n" sub dir gets created depending upon test run for several projects.

Each Project DIR contains more that one RUN directories....

I want to process each "Projects" directories as and when they are created and out put them to different indices at elasticsearch.

e.g. For Project1 index to be set as "Logstash-Project1-%{+YYYY.MM.dd}", similarly for Project2 as "Logstash-Project2-%{+YYYY.MM.dd}" and so on.

How do I handle input/ output....

input {
file {
path => "/Results/Project*/RUN*/*.csv"
start_position => "beginning"
}
}

output {
elasticsearch {
action => "index"
host => "localhost"
index => "logstash-%{+YYYY.MM.dd}"

}

magnusbaeck · June 29, 2015, 12:54pm

Use the grok filter to extract the project name from the input file path (stored in the path field), then reference that field when setting the index pattern of the elasticsearch output.

elasticsearch {
  ...
  index => "logstash-%{project}-%{+YYYY.MM.dd}"
}

Saket_Kumar · June 29, 2015, 2:09pm

I dont know how to use grok filter to extract the project name from the input file path. What pattern i need to match for extracting Project from path. Any help?

magnusbaeck · June 29, 2015, 2:21pm

If you want to extract the first directory component below a directory named Results, this is untested but should work:

grok {
  match => ["path", "/Results/(?<project>[^/]+)/"]
}

Saket_Kumar · June 29, 2015, 3:13pm

thank you so much. you saved my day...

Saket_Kumar · June 29, 2015, 3:55pm

Sorry to disturb you again.
But got to know that file path is something like
path => "/opt/Results/*/Run/webobj.csv"

Grok filter which i want to use for extracting the project name is from /*/ of path.

Can you please suggest appropriate match for getting the * value immediate after "Results/*" directory.

Is there any way from where I can generate grok pattern to match.

Your help is much appreciated. Thanks again.

magnusbaeck · June 30, 2015, 3:34am

The input file path has nothing to do with the path field. Its value is taken from the name of the actual file from which a particular log message actually came.

Saket_Kumar · June 30, 2015, 5:45am

Great Help I could do it...

grok {
match => [ "path", "/opt/Log/Results/(?[^/]+)/" ]
}

Thanks

Saket_Kumar · July 2, 2015, 8:43am

In case of linux it worked so I just thought of to run on windows using
grok {
match => ["path", "C:\Test\Result(?[^/]+)"]
}
This didn't work. any idea why?

magnusbaeck · July 2, 2015, 9:37am

Backslashes are metacharacters in regexps so to get match literal backslashes you need to use "\\".

Saket_Kumar · July 2, 2015, 10:06am

grok {
match => ["path", "C:\Test\Result\(?[^\]+)\"]
}

When i run logstash with above config (As you suggested) I get Error: Expected one of #, {, ,,] at line ....

magnusbaeck · July 2, 2015, 2:08pm

It looks like you're escaping the closing double quote. This is what you need:

grok {
  match => ["path", "C:\\Test\\Result\\(?[^\]+)\\"]
}

vasumathy · June 28, 2016, 12:34am

I am using logstash-2.3.2. "type" option is not working

input {
jdbc {
jdbc_driver_library => ..
jdbc_driver_class =>..
jdbc_connection_string =>..
jdbc_user => ..
jdbc_password => ..
statement => ..
type => "deploy"
}
}
output {
if [type]=="deploy"{
elasticsearch { hosts => ["localhost:9200"]
index => "metricslog"
}
}
}

RandyChen · August 11, 2016, 8:24am

I find the grok match can not enable the "path" and "message" work well together.
like this :
match => {
"message" => "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}"
"path" => "/logfile/(?\w+?)/"
}

the "message" will run fail

magnusbaeck · August 11, 2016, 8:26am

I suspect you'll want to split that grok filter in two to make sure both expressions are always evaluated.

Apache_HOU · February 27, 2017, 1:01pm

@magnusbaeck

Hi Magnus,

First of all, thanks for your previous posts about index. I've tried your suggestion but I don't kown what i did wrong as it does' work for me... (Please inform me if you prefer opening a new topic)

I've 2 log files who stored in the directories /tmp/toto/first/ and /tmp/toto/second/ . I want to have the different index name distinguished by the project name (first and second in this case). Here are my configurations :

Filebeat :

...
  paths:
    - /tmp/toto/*/*.log
...

Logstash

input {
    beats {
        port => "5043"
    }
}

filter {
    grok {
        match => { "path" => "/tmp/toto/(?<project>[^/]+)/" }
        match => { "message" => "%{COMBINEDAPACHELOG}"}
    }
    date {
    match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
  }

}
output {
    elasticsearch {
        hosts => [ "127.0.0.1:9200" ]
        index => [ "log-%{project}-%{+YYYY.MM.dd}" ]
    }
}

After starting all services, it seems elasticsearch doesn't understand project variable :

health status index                     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   log-%{project}-2016.06.20 QNPYvvFqRzGFEC9_32da7g   5   1        450            0    807.6kb        807.6kb
yellow open   log-%{project}-2017.02.22 RzDgOdWKQXqmItnm0zNdGw   5   1        232            0    138.5kb        138.5kb

Do you have any ideas ?

magnusbaeck · February 27, 2017, 1:08pm

It looks like the event doesn't have a project field. Check what the events in the log-%{project}-2017.02.22 index look like.

Apache_HOU · February 27, 2017, 3:21pm

@magnusbaeck

Here is an example of output. I think you're right, i don't have the > project fiield. Do you know how to correct it ?

{
  "_index": "log-%{project}-2017.02.22",
  "_type": "log",
  "_id": "AVqAEzYEO-tgZd20LulR",
  "_score": null,
  "_source": {
    "request": "/",
    "agent": "\"Links (1.03; Linux 2.6.32-642.6.2.el6.x86_64 x86_64; dump)\"",
    "offset": 172883,
    "auth": "-",
    "ident": "-",
    "input_type": "log",
    "verb": "GET",
    "source": "/tmp/toto/first/access_20170222.log",
    "message": "10.124.49.22 - - [22/Feb/2017:23:00:02 +0100] \"GET / HTTP/1.1\" 404 2916 \"-\" \"Links (1.03; Linux 2.6.32-642.6.2.el6.x86_64 x86_64; dump)\" ",
    "type": "log",
    "tags": [
      "beats_input_codec_plain_applied"
    ],
    "referrer": "\"-\"",
    "@timestamp": "2017-02-22T22:00:02.000Z",
    "response": "404",
    "bytes": 2916,
    "clientip": "10.124.49.22",
    "@version": "1",
    "beat": {
      "hostname": "new",
      "name": "new",
      "version": "5.2.1"
    },
    "host": "new",
    "httpversion": "1.1",
    "timestamp": "22/Feb/2017:23:00:02 +0100"
  },
  "fields": {
    "@timestamp": [
      1487800802000
    ]
  },
  "sort": [
    1487800802000 
  ]
}

magnusbaeck · February 27, 2017, 3:31pm

We've digressed from the original topic. Please start a new thread for your question.

(Hint: Where's the path field you're attempting to parse with grok?)

Topic		Replies	Views
Logstash - multiple indices for one input file Logstash	3	2070	December 21, 2016
Creating indices from multiple inputs Logstash	2	338	June 26, 2019
Logstash file input - output to different indices Logstash	5	1002	August 26, 2021
I need to have multiple index based on the path of log file Elasticsearch	3	246	October 7, 2021
How to handle multiple syslog inputs with Logstash to different indices Logstash	5	1695	June 14, 2019

How to handle multiple inputs with Logstash to different indices

Related topics