LS intermittently shows zero EPS received/emitted

millap · April 7, 2019, 6:44pm

Hi All,

Looking for a little advice in trying to narrow down the source of a problem we have with our ELK instance.

Intermittently, we're seeing the received, and emitted EPS on an LS node drop to zero and we can't figure out why. The image below demonstrates what happens.

We've pushed the 6.7.1 code for LS out (along with ES, and K on other nodes), but this hasn't fixed the issue. Other things we've tried are

Increase Heap
Disable some filter plugin use on conf files (jdbc_streaming, drop, geoip, cidr).
Increase/decrease pipeline.batch.size from 125-250-500-1000, as the system is able to cope with it.
Performed bin/logstash-plugin update.

None of the above have fixed the issue, and I'm unsure on where to look next. The heap graph looks 'wrong' to me, when the issue occurs, but I don't know where to look to investigate that.

A tcpdump on the ingress interface shows that all inputs port are receiving traffic (we use a mix of beats, syslog, and json).

Any ideas, or suggestions?

Cheers
Andy

Badger · April 7, 2019, 6:58pm

I do not think that is a problem. The long-term sawtooth pattern is caused by objects from Eden being promoted into the tenured generation. Then the tenured generation is full it runs a GC and frees all the garbage.

If LS stops processing events (which it does) then the promotion greatly slows down and you see a much shorter-term sawtooth pattern as the new generation fills and is GCd.

I suspect that the output is not accepting events, so the queues fill and back-pressure prevents LS from reading more events.

What sort of output are you using? Anything of interest in its logs?

millap · April 7, 2019, 7:19pm

Hi @Badger

Thanks a lot for the swift response.

There's nothing in the LS logs (logstash-plain.log) to show any issues when the problem occurs. Or in the clusterXX.log on the remote ES host.

We have an ingoing issue with a JunOS filter (kv plugin) which we're looking into, but it's been there since day #1, so possibly no relation -

0x73c952b9&gt;], :response=&gt;{"index"=&gt;{"_index"=&gt;"logstash-2019.04.07", 
"_type"=&gt;"doc", "_id"=&gt;"ca81-WkBixSH8MB7Axct", "status"=&gt;400, "error"=&gt;

{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse", "caused_by"=>
{"type"=>"illegal_argument_exception", "reason"=>"object field starting or ending with a [.]
makes object resolution ambiguous: [0..3]"}}}}}

In ES logs, I see -

0x3354a357>], :response=>{"index"=>{"_index"=>"logstash-2019.04.07", "_type"=>"doc", 
"_id"=>"rbY6-WkBixSH8MB7nLDi", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", 
"reason"=>"failed to parse", "caused_by"=>{"type"=>"illegal_argument_exception", 
"reason"=>"object field starting or ending with a [.] makes object resolution ambiguous: [0..7]"}}}}}

I tweaked the pipeline.batch.size to 1000 recently, and the graphs are much more what I'd expect historically (this ELK has been running for about 9 months) -

Output is all to ES. Nothing unusual -

output {
          elasticsearch {
          hosts => ["http://x.x.x.x:9200"]
          index => "logstash-%{+YYYY.MM.dd}" }
}

Cheers
Andy

system · May 5, 2019, 7:34pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash limitting ElasticSearch heap Elasticsearch	5	454	July 6, 2017
Logstash stop communicating with Elasticsearch Elasticsearch	4	600	July 6, 2017
Logstash OOM - understanding heap sizing Logstash	14	12938	November 9, 2017
Logstash heap size Logstash	6	7010	July 6, 2017
A strange behavior we've encountered on our ELK Elasticsearch	15	600	July 6, 2017

LS intermittently shows zero EPS received/emitted

Related topics