The fields var.logfile and var.index_name are custom and will replace the placeholders ${logfile} and ${index_name} in the template.conf file.
Here is the template.conf file:
input {
google_cloud_storage {
bucket_id => "my-bucket"
interval => 5
json_key_file => "/etc/gcloud/credencial.json"
file_matches => "${logfile}" # <<< value of var.logfile here
}
}
filter {
# Use grok to process log entries
grok {
match => {
"message" => "\[%{LOGLEVEL:severity}\s*\] %{DATESTAMP:timestamp} %{DATA:origin} - %{GREEDYDATA:msg}"
}
pattern_definitions => {
"LOGLEVEL" => "INFO|WARN|ERROR"
}
}
}
output {
elasticsearch {
hosts => ["https://my_es_host:443"]
user => "user"
password => "pwd"
index => "${index_name}" # <<< value of var.index_name here
ssl_enabled => true
}
}
Is this possible?
Can I use one template.conf for all the pipelines and substitute only the values I need, so that I can reuse the content of template.conf without duplicating it for every pipeline?
No, this is not possible with Logstash and you cannot reuse configurations, you need to duplicate.
But you do not need to create thousands of files, you may use something like ansible to create them for you and even create their entries in pipelines.yml.
Also, 1000 pipelines in a single logstash instance may not be ideal and may have performance issues.
Can you provide more context of your use case? Do you really need thousands of indices, one per file? This is not good practice.
Each input file is a log file. They have the same format, and the naming convention is [username].log.
Each user has a log file.
I want to isolate each user into a separate index in Elasticsearch.
For example: user01.log → index_user01 user02.log → index_user02 user03.log → index_user03
and so on...
The .log files are in GCP Cloud Storage and are updated in real time.
Logstash monitors new entries in each file and sends them to Elasticsearch.
What would be the best strategy?
A single pipeline for all 1000 files?
But how can I differentiate the indices in the output?
The only information I have to differentiate them is in the filename, not in the log content.
I tried the following approach, but it took too long to start collecting and sending data.
Input configuration:
file_matches => "prod/cs/*.log"
In the filter, extract the filename and remove the .log extension to use it as the index name:
grok {
match => { "path" => ".*\/(?<index_name>[^\/]+)\.log$" }
}
In the output, set the dynamic index name based on the filename:
What is the content of these logs files? They all have the same structure in the events?
If so, you could have just one index and add the filename as a field in the event using logstash.
You have the path of the file, so you can extract the information of the user from it and add a new field like user_name with this information, this will allow you to filter the events by the user.
The logstash part works, but it is not a good strategy because you would still have one index per user, this will not scale it can lead to a lot of shards in your cluster and a lot of small indices.
As mentioned, if the structure in the events is the same, you may use just one index and add the information of the user into the event before indexing the data.
According to the documetnation, yes, you can use wildcards on the file_matches option.
One thing that you mentioned is this:
Being honest I'm not sure this will work with this plugin as the files are in object storage and the plugin is used to download complete files, not files that are still being written.
If you are updating the files in the object storage and expect logstash to get the new lines, I don't think this work.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.