How can define multi indices in logstash configuration file based on the type of inputs


(sahere rahimi) #1

Hi all,
I am using logstash to get file from filebeats and ship them to two nodes of elasticsearch. logstash has two inputs with different types. I want to define different indices based on the input type, so I write following logstash.conf file

input {
beats {
port => 5044
host => "192.168.170.146"
type => "VM1"
}
beats {
port => 5044
host => "192.168.170.140"
type => "VM2"
}
}
output {
elasticsearch {
hosts => ["192.168.170.146:9200","192.168.170.140:9200"]
if [type]=="VM1" {
index => ["VM1-%{+yyyy.MM.dd}"]
} else {
index => ["VM2-%{+yyyy.MM.dd}"]
}
}
stdout { codec => rubydebug }
}

when I execute the command (./logstash -f logstash.conf), logstash stops and following error can be seen. could you please advise me about this error?

[srahimi@imguatapp01 bin] ./logstash -f logstash.conf Sending Logstash logs to /home/srahimi/ELK/logstash-6.4.2/logs which is now configured via log4j2.properties [2018-11-18T09:11:54,459][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified [2018-11-18T09:11:55,178][INFO ][logstash.runner ] Starting Logstash {"logstash.version"=>"6.4.2"} [2018-11-18T09:11:55,759][ERROR][logstash.agent ] Failed to execute action {:action=>LogStash::PipelineAction::Create/pipeline_id:main, :exception=>"LogStash::ConfigurationError", :message=>"Expected one of #, => at line 16, column 11 (byte 268) after output {\n elasticsearch {\n hosts => [\"192.168.170.146:9200\",\"192.168.170.140:9200\"]\n if ", :backtrace=>["/home/srahimi/ELK/logstash-6.4.2/logstash-core/lib/logstash/compiler.rb:41:in `compile_imperative'", "/home/srahimi/ELK/logstash-6.4.2/logstash-core/lib/logstash/compiler.rb:49:in `compile_graph'", "/home/srahimi/ELK/logstash-6.4.2/logstash-core/lib/logstash/compiler.rb:11:in `block in compile_sources'", "org/jruby/RubyArray.java:2486:in `map'", "/home/srahimi/ELK/logstash-6.4.2/logstash-core/lib/logstash/compiler.rb:10:in `compile_sources'", "org/logstash/execution/AbstractPipelineExt.java:149:in `initialize'", "/home/srahimi/ELK/logstash-6.4.2/logstash-core/lib/logstash/pipeline.rb:22:in `initialize'", "/home/srahimi/ELK/logstash-6.4.2/logstash-core/lib/logstash/pipeline.rb:90:in `initialize'", "/home/srahimi/ELK/logstash-6.4.2/logstash-core/lib/logstash/pipeline_action/create.rb:38:in `execute'", "/home/srahimi/ELK/logstash-6.4.2/logstash-core/lib/logstash/agent.rb:309:in `block in converge_state'"]} [2018-11-18T09:11:56,062][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600} [srahimi@imguatapp01 bin]


(Christian Dahlqvist) #2

You can not have conditionals within a filter definition so need to change that. The best way is probably to set a field in your input and then use this to build the index name:

input {
  beats {
    port => 5044
    add_field => { "[@metadata][type]" => "vm1" }
  }
}

output {
  elasticsearch {
    index => ["%{[@metadata][type]}-%{+yyyy.MM.dd}"]
  }
}

Note that index names should be lower cased and that the host specified for the beats inputs in the Logstash config are the interfaces the plugin will bind to and must be local to the host.


(sahere rahimi) #3

Many thanks.
do you mean that logstash and filebeat should run on the same machine? and remote access from filebeat to logstash doesn't work?


(Christian Dahlqvist) #4

Off course remote access from Filebeat to Logstash works - that is how it is most commonly deployed. If you do not specify an IP address for the Beats input, it will bind to all available interfaces and act as a server, accepting connections from all Beats. There is no need to have define an input per beat connecting to it. If you want to see where the data is coming from you can look at the host field in the data the beats send.


(sahere rahimi) #5

the reason of defining separate inputs in logstash is to set different index for them.
I have two VMs with IPs: 192.168.170.146 and 192.168.170.140.
in each VM, i install filebeat, logstash and elasticsearch. actually, i am using two nodes of elasticsearch to form a cluster (for high availability). following explanations can be considered:

1- the filebeat of each VM (for example, filebeat installed in first vm: 192.168.170.146) sends the logs of that VM to both logstashs (logstash installed in 192.168.170.146 and 192.168.170.140). therefore, following configuration is considered in filebeat.yml of each VM
#----------------------------- Logstash output --------------------------------
#output.logstash:
hosts: ["192.168.170.146:5044","192.168.170.140:5044"]

2- log stash on each VM listen on port 5044 to give the files from both filebeats, so following configuration is considered in logstash.conf file of both VMs:

input {
beats {
port => 5044
host => "192.168.170.146"
add_field => { "[@metadata][type]" => "vm1" }

}
beats {
port => 5044
host => "192.168.170.140"
add_field => { "[@metadata][type]" => "vm2" }

}
}
output {
elasticsearch {
hosts => ["192.168.170.146:9200","192.168.170.140:9200"]
index => ["%{[@metadata][type]}-%{+yyyy.MM.dd}"]

}
stdout { codec => rubydebug }
}

so, i want to define different indices for these vms. actually define an index for each filebeat input.
3- an elasticsearch has been installed in each VM, so we have two nodes of elasticsearch to form a cluster.

4- kibana is installed in one of VMs (192.168.170.146)
according to above notations, are the configurations ok? i run ELK on both VMs; but in kibana, I cannot find the indices defined in logstash.conf.


(Christian Dahlqvist) #6

The host the Beats input binds to must be local, which is not the case here. You can however do it by leaving out the host parameter and instead use different ports. Be aware that this however will scale badly.


(sahere rahimi) #7

what do you mean scale badly?


(Christian Dahlqvist) #8

If the number of Beats connecting were to grow and you added an input for each one of them you would probably eventually run into problems. If you decide to scale out I would recommend instead looking at the data sent and determine the index based on this instead. You may also not want an index per host of the number of hosts were to grow as this can result in a lot of small shards and be very inefficient.