Regarding Logstash file output settings

Hi

Can you tell me about file archiving settings in Logstash?
I'm creating a pipeline configuration file as shown below.
With this configuration, no file (~.log) is output.
What could be the cause? Also, what should I check?

[root@logstash-1 ~]# cat /etc/logstash/conf.d/syslog-pipeline.conf

input {
 syslog { port => 5140 }
}

filter {
 grok {
  match => { 
   "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:[%{POSINT:syslog_pid}])?: %{GREEDYDATA:syslog_message}" 
  }
 }
}

output {
 elasticsearch {
  hosts => ["http://192.168.1.xxx:9200", "http://192.168.1.yyy:9200"]
  user => "elastic"
  password => "<password>"
  index => "syslog-%{+YYYY.MM.dd}"
}

file {
 path => "/var/log/logstash-archive/%{+YYYY}/%{+MM}/%{+dd}/%{host.hostname}.log"
 codec => "json_lines"
 }
}
  • Other information
    The owner of the output folder (/var/log/logstash-archive) is the "logstash" user.
    Is this a problem?
[root@logstash-1 ~]# ll /var/log | grep logstash-archive
drwxr-xr-x. 2 logstash logstash 6 Oct 2 10:58 logstash-archive
  • Additional information
    There are no folders under /var/log/logstash-archive.
    Is it possible to create such folders dynamically?

The owner of the output folder (/var/log/logstash-archive) is the "logstash" user.
Is this a problem?

No, that's fine if you run LS in the service mode. If you run as an process, then permission should have from that user which is started the LS command.

Is it possible to create such folders dynamically?

AFAIK, LS will not create a directory, only a file.
You have set correctly your syslog-pipeline.conf.

You should do:

  • Check LS logs /var/log/logstash, there should be a trace something like...TCP/UDP listener has been started on port 5140
    Note: This input will start listeners on both TCP and UDP. That means when LS is started listening, you can use telnet on localhost to connect to port 5140 and send random characters. You will see the tag: _grokparsefailure
  • Add debug mode to see what's happening, it there any traffic coming to LS. Add in the output:
output {
  stdout { codec => rubydebug{} }

  elasticsearch {
  hosts => [....

file {
 path => "/var/log/logstash-archive/%{+YYYY}/%{+MM}/%{+dd}/%{[host][hostname]}.log"
 codec => "json_lines"
 }
  • Correct [host][hostname] variable as above
  • Check the local or network firewall logs
1 Like

To establish if logstash would create the files just (temporarily) add another output to /path/to/an/existing-directory/file.out

I just did so (v8.19.4):

$ cat /etc/logstash/conf.d/x.conf
input {
 syslog { port => 5140 }
}
output {
file {
 path => "/var/log/logstash/%{+YYYY}/%{+MM}/%{+dd}/log1.log"
 codec => "json_lines"
 }
file {
 path => "/var/log/logstash/%{+YYYY}_%{+MM}_%{+dd}_log2.log"
 codec => "json_lines"
 }
}

$ sysstemctl restart logstash.service

$ echo '{"key1":"value1"}' | nc -u -w1 localhost 5140

$ cat /var/log/logstash/2025_10_06_log2.log
{"message":"{\"key1\":\"value1\"}\n","tags":["_grokparsefailure_sysloginput"],"service":{"type":"system"},"log":{"syslog":{"priority":13,"facility":{"code":1,"name":"user-level"},"severity":{"code":5,"name":"Notice"}}},"event":{"original":"{\"key1\":\"value1\"}\n"},"@timestamp":"2025-10-06T16:47:46.853737458Z","host":{"ip":"127.0.0.1"},"@version":"1"}

$ cat /var/log/logstash/2025/10/06/log1.log
{"message":"{\"key1\":\"value1\"}\n","tags":["_grokparsefailure_sysloginput"],"service":{"type":"system"},"log":{"syslog":{"priority":13,"facility":{"code":1,"name":"user-level"},"severity":{"code":5,"name":"Notice"}}},"event":{"original":"{\"key1\":\"value1\"}\n"},"@timestamp":"2025-10-06T16:47:46.853737458Z","host":{"ip":"127.0.0.1"},"@version":"1"}

So clearly in this case it created the required directories.

1 Like

You'll know that syslog usually uses port 514, not port 5140. On the logstash server, you can make sure you are actually receiving data on port 5140, with tcpdump or similar tools. Or, as @Rios suggested, generate some traffic yourself, with telnet, or netcat, or bash even.

iptables and/or other firewalls may need to be disabled / re-configured.

Obviously, if the data is reaching elasticsearch's syslog-* indices then it is flowing, is that the case?

See my other response.

And a belated "Welcome to the forum!! @sysrq_1231 !!"

(and, in passing and just to check, you have 2 elasticsearch IPs listed. I hope that is 2 from N elasticsearch nodes in your cluster, where N>=3?)

2 Likes

@Rios @RainTown
Thank you for answering my question!!

In conclusion,
I confirmed that files and directories are created with the following settings.
I was also able to verify the data in Elasticsearch.
(It seems it takes time for the files and directories to be created,
and I apparently couldn't wait for that. My apologies!)

[root@logstash-1 ~]# cat /etc/logstash/conf.d/syslog-pipeline.conf

input { syslog { port => 5140 } }

filter {  grok {
    match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
  }
}

output {
  elasticsearch {
    hosts => ["http://192.168.1.xxx:9200", "http://192.168.1.yyy:9200"]
    user => "elastic"
    password => "<password>"
    index => "syslog-%{+YYYY.MM.dd}"
  }
  file {
    path => "/var/log/logstash-archive/%{+YYYY}/%{+MM}/%{+dd}/%{[host][hostname]}.log"
    codec => "json_lines"
  }

  stdout { codec => rubydebug }
}

This time, I'm testing with the syslog port changed to 5140.

[root@logstash-1 ~]# ll /var/log/logstash-archive/2025/10/07/
total 16
-rw-r--r--. 1 logstash logstash 12458 Oct  7 06:33 testserver.log

As mentioned above, the date directories were also created by Logstash.
(Initially, there were no directories below “2025” either)

1 Like

@RainTown
For this ELK Stack verification,
I am building the environment with the following configuration:
・Elasticsearch Master Node x 3
・Elasticsearch Data Node x 2
・Logstash x 2
・Kibana x 2

output {
  elasticsearch {
    hosts => ["http://192.168.1.xxx:9200", "http://192.168.1.yyy:9200"]
    user => "elastic"
    password => "<password>"
    index => "syslog-%{+YYYY.MM.dd}"
  }

In the above configuration, I specified the number of data nodes (please let me know if this is incorrect).
I checked the following site, but considering resilience,
would it be better to have three data nodes as well?