Slowing Down File Ingest

Is there a way to slow down the rate at which the file input ingests data from a file before delivering it to the filter. The problem I have is the file input using multiline codec is feeding data to my XML filter faster than the XML filter can process the data, causing it to be dropped. Unfortunately, I cannot use disk queueing because it seems to be broken when used in conjunction with the XML filter. For testing, I have a file that results in about 1,800 lines that the XML filter then pulls about 26 fields from which geo-ip is then performed creating more fields. This file takes a good minute to make it all the way through the file ingest and during this time, Kibana>Monitoring>LogStash>Advanced shows a queue depth of 1.

The problem I have is the file input using multiline codec is feeding data to my XML filter faster than the XML filter can process the data, causing it to be dropped.

How did you reach that conclusion? Logstash doesn't work that way.

I feed it smaller files and no issue, data is ingested. I feed it the larger 1,800 line file, debug log shows multiline working and logstash queue goes up to 1 but nothing ever displays in Kibana Discover. Even hours later there's nothing.

What if you comment out the filters, i.e. does the problem seem to be related to the ingestion or the filters?

Well shit....if I take out the filtering the data still doesn't show up in Kibana discover. Watching the logstash debug log, I am seeing [2018-03-08T16:49:37,985][DEBUG][logstash.pipeline ] output received {"event"... a ton so it seems like Logstash is delivering it. Below is the pipeline I am using, excluding the filtering:

input {
  file {
    id => "Ingest\DMARC\*.xml"
    path => "D:/ElasticStack/Ingest/DMARC/*.xml"
    discover_interval => 5
    codec => multiline {
      auto_flush_interval => 5
      negate => true
      pattern => "<record>"
      what => "previous"
    }
  }
}
output {
  elasticsearch {
    id => "Send to Elasticsearch"
    hosts => ["FQDN:9200"]
# Uncomment below and configure for XPack integration.
#    user => "elastic"
#    password => "elastic"
    http_compression => true
    template => "D:/ElasticStack/Apps/Logstash/templates/dmarcxmltemplate.json"
    template_name => "dmarcxml"
    index => "dmarcxml-%{+YYYY.MM.dd}"
  }
}

Blah....I've been looking at the wrong log to identify the issue. Opened up the Elasticsearch log and I get all kinds of failed to index event because one of my date fields has a space at the end of it. This is, apparently, an inconsistency between reporting MTAs... The date format is in epoch seconds. Any ideas on how to remove it? I tried the below but it's not removing the space, should I be using a set of characters to represent space?

  mutate {
    convert => {
      "report.start" => "integer"
      "report.end" => "integer"
    }
    gsub => [
      "report.start", " ", "",
      "report.end", " ", ""
    ]
  }

Forgot about the strip function, but that's not working either. Here's what the json output looks like...not sure how to get that space off the backend...any ideas?

<date_range>\r\n <begin>1517875200</begin>\r\n <end>1517961599 </end>\r\n </date_range>

Oh...another interesting bit...looking at the original XML file before it gets ingested....there's no space.... I guess that explains why the mutate functions aren't doing anything...but it doesn't explain what's putting a space in the field...

Hate when I do dumb things....

The data is using the report.start field as the time stamp. I was making changes, importing the data and then seeing the bar graph at the top of the discover page incrementing. Unfortunately, the record at the top wasn't a record from the latest ingest test...so I was looking at the same event over and over :man_facepalming:

Time to feed it all my data....see if anything else is busted....sorry for the red herring.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.