Splitting for Logstash working intermittently


Currently, I have Logstash configured for splitting. The issue is that, sometimes it works, sometimes it goes about 10-15 minutes without any output (it suppose to have an output every 5 minutes).

At the backend, I have scheduled a cronjob to call to an API and then write to a file (JSON format). Logstash will then in turn, read the file and split "result" field.

Here's my Logstash code:

input {
 file {
   start_position => "end"
   path => ["/var/log/logstash/abc_api-*.json"]
   sincedb_path => "/dev/null"
#stdout { codec => rubydebug { metadata => true } }

filter {
 json {
   source => "message"
 split {
   field => "result"
#stdout { codec => rubydebug { metadata => true } }

output {
  elasticsearch {
  ssl => true
  ssl_certificate_verification => false
  cacert => "/etc/logstash/elasticsearch-ca.pem"
  hosts => ""
  user => "${LS_USER}"
  password => "${LS_PWD}"
  manage_template => true
  index => "abc-logs-%{+YYYY.MM.dd}"
  pipeline => "abc-api"
#stdout { codec => rubydebug { metadata => true } }

I've searched the forum and I think this post helps. But I don't fully understand the "splitData.rb" code, hence, it's not implemented. It seems to be removing the "message" field but I could be wrong.

Also, I have configured Logstash to have 4GB of heap memory. In the array file, it has about 500 items for now (will grow to ~5000). Usually I use Filebeat for all things Elastic (its less resource intensive), but Filebeat does not support splitting feature. So, this is an exception.

No. That code was only needed because there was an issue in the split filter that caused excessive memory use. That issue has since been fixed.

Have you confirmed that there are files that match the regexp that have not been read by logstash?

Hi Badger,

The API calls always output the same JSON pattern. From the picture below, there is a gap, but I'm not sure what happened. I'll further monitor this. Just your confirmation on the memory usage fix is already good enough since I can rule that out from my troubleshooting.

Perhaps will work on the pipeline a little. Right now, in Elastic, each document @timestamp is based on the time it was ingested. Maybe I can change it to take the timestamp directly from API query instead since within the API result, there is a timestamp value.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.