Slowing Down File Ingest

wwalker · March 7, 2018, 10:53pm

Is there a way to slow down the rate at which the file input ingests data from a file before delivering it to the filter. The problem I have is the file input using multiline codec is feeding data to my XML filter faster than the XML filter can process the data, causing it to be dropped. Unfortunately, I cannot use disk queueing because it seems to be broken when used in conjunction with the XML filter. For testing, I have a file that results in about 1,800 lines that the XML filter then pulls about 26 fields from which geo-ip is then performed creating more fields. This file takes a good minute to make it all the way through the file ingest and during this time, Kibana>Monitoring>LogStash>Advanced shows a queue depth of 1.

magnusbaeck · March 8, 2018, 7:29am

The problem I have is the file input using multiline codec is feeding data to my XML filter faster than the XML filter can process the data, causing it to be dropped.

How did you reach that conclusion? Logstash doesn't work that way.

wwalker · March 8, 2018, 12:53pm

I feed it smaller files and no issue, data is ingested. I feed it the larger 1,800 line file, debug log shows multiline working and logstash queue goes up to 1 but nothing ever displays in Kibana Discover. Even hours later there's nothing.

magnusbaeck · March 8, 2018, 1:40pm

What if you comment out the filters, i.e. does the problem seem to be related to the ingestion or the filters?

wwalker · March 8, 2018, 10:56pm

Well shit....if I take out the filtering the data still doesn't show up in Kibana discover. Watching the logstash debug log, I am seeing [2018-03-08T16:49:37,985][DEBUG][logstash.pipeline ] output received {"event"... a ton so it seems like Logstash is delivering it. Below is the pipeline I am using, excluding the filtering:

input {
  file {
    id => "Ingest\DMARC\*.xml"
    path => "D:/ElasticStack/Ingest/DMARC/*.xml"
    discover_interval => 5
    codec => multiline {
      auto_flush_interval => 5
      negate => true
      pattern => "<record>"
      what => "previous"
    }
  }
}
output {
  elasticsearch {
    id => "Send to Elasticsearch"
    hosts => ["FQDN:9200"]
# Uncomment below and configure for XPack integration.
#    user => "elastic"
#    password => "elastic"
    http_compression => true
    template => "D:/ElasticStack/Apps/Logstash/templates/dmarcxmltemplate.json"
    template_name => "dmarcxml"
    index => "dmarcxml-%{+YYYY.MM.dd}"
  }
}

wwalker · March 9, 2018, 12:00am

Blah....I've been looking at the wrong log to identify the issue. Opened up the Elasticsearch log and I get all kinds of failed to index event because one of my date fields has a space at the end of it. This is, apparently, an inconsistency between reporting MTAs... The date format is in epoch seconds. Any ideas on how to remove it? I tried the below but it's not removing the space, should I be using a set of characters to represent space?

  mutate {
    convert => {
      "report.start" => "integer"
      "report.end" => "integer"
    }
    gsub => [
      "report.start", " ", "",
      "report.end", " ", ""
    ]
  }

wwalker · March 9, 2018, 12:29am

Forgot about the strip function, but that's not working either. Here's what the json output looks like...not sure how to get that space off the backend...any ideas?

<date_range>\r\n <begin>1517875200</begin>\r\n <end>1517961599 </end>\r\n </date_range>

Oh...another interesting bit...looking at the original XML file before it gets ingested....there's no space.... I guess that explains why the mutate functions aren't doing anything...but it doesn't explain what's putting a space in the field...

wwalker · March 9, 2018, 12:50am

Hate when I do dumb things....

The data is using the report.start field as the time stamp. I was making changes, importing the data and then seeing the bar graph at the top of the discover page incrementing. Unfortunately, the record at the top wasn't a record from the latest ingest test...so I was looking at the same event over and over

Time to feed it all my data....see if anything else is busted....sorry for the red herring.

system · April 6, 2018, 12:51am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Fastest way to ingest CSV's with logstash to elasticsearch Logstash	9	530	June 8, 2023
Logstash ingestion performance for log management Logstash	7	4647	March 9, 2020
Multiline Filter Help to find specific field Logstash	2	390	February 15, 2018
Can i use file filter for xml docs Logstash	12	2876	July 6, 2017
Filebeat with multiline vs. multiline logstash codec Beats filebeat	8	4512	July 5, 2017

Slowing Down File Ingest

Related topics