Logstash Queue

Hii I am using logstash to parse my logs . This is my architecture . Filebeat ships logs from server to Logstash. Logstash parses the logs and sends it to Elasticsearch . Elasticsearch's logs can be viewed using kibana . Now one of my client said that he can have about 500MB of logs generated in a day . So in order to test whether my ELK stack can handle this amount of traffic i wrote a bash script that is as shown below -

while true
do
dt=$(date '+%d/%m/%Y %H:%M:%S')
echo $dt "local.ERROR:" $dt >> 'testing.log'
done

The contents of this testing.log file is getting shipped to the Logstash . But even when I stopped the script after 12 hrs or so logs were getting printed on the console . What does this means ??? further the time was old that means Logstash is somehow queuing the logs somewhere . Does Logstash queues logs ? If yes what can I do to remove these logs from the queue ????

Filebeat (and/or Logstash) probably can't keep up with your loop that does nothing except append data to a file. That's why Logstash keeps processing the messages even after you've shut down the shell script. The file itself effectively becomes Logstash's buffer/queue.

500 MB/day is about 6 kB/s if we naïvely assume a uniform distribution over the day. That's nothing and way less than what you produced with your shell script.

@magnusbaeck So how to stop logstash from parsing the logs of the file ???

Besides shutting down Logstash/Filebeat? Well, deleting the file or removing it from the configuration should do it.

@magunsbaeck Shutting down logstash and restarting it does not do the desired thing but I believe deleting the file will sure do it .

@magunsbaeck One more thing when I am monitoring the node with ELK installed on it and this bash script running I can see about 100% CPU usage and the RAM use was increasing continuously , so upto what value it will go ?Earlier I was pushing my logs from logstash to elasticsearch alongwith printing on the console due to which RAM usage was continuously increasing , then i disallowed the printing of logs to console . Now it is not continuously increasing . To have the stats earlier it went upto 2600/3764 and now it is 1993/3764 . What is the worst case it can achieve ??

Shutting down logstash and restarting it does not do the desired thing but I believe deleting the file will sure do it .

Restarting won't help since they're designed to pick up from where they stopped.

What is the worst case it can achieve ??

Which process, Logstash or Elasticsearch?

@magnusbaeck worst case for RAM because of such heavy load. can it happen that ELK stack goes down after too much RAM is being used ?? Or extreme slowness gets there ?

Elasticsearch could definitely run out of heap space if you push too much data into it. There's no formula to decide how much heap space you need so it's something you have to figure out yourself and monitor.

@magnusbaeck Yes we are thinking of monitoring Elasticsearch using Datadog.

  1. Will this be a good idea ?? Can you suggest alternative if its not a good idea ?? 2.Further is it possible that because of that bash script I might not get data from other server ??? I am saying this because I could not see logs after 10:03:51 in elasticsearch but the data was produced on the server.
    3.So in order to prevent this situation should I add a node ?? Right now I have my Logstash , Elasticsearch and Kibana installed on same machine. Will adding a node gurantee that I will not loose data from other server(s) ??
  1. I have no experience with Datadog. Anything that works for you by letting you track process resource usage is a good thing.
  2. Yes, this is a possibility. If Logstash is clogged it may push back against network connections. Depending on the sending side you might not lose messages permanently but they could get delayed. If you're not receiving events from the network but just reading local files you shouldn't lose any messages (but because of log rotations it's not impossible).
  3. There are never guarantees. To deal with systems that go crazy and produce unlimited amounts of data you need unlimited resources somewhere. You have to pick a level you want to be able to support and design your system around that.

@magnusbaeck So by point number two you mean that I won't loose data but it will get delayed right ?? I will get the data after some time right ??

Local files act as a buffer for a limited time, yes.

@magnusbaeck So lets just suppose I add a node (at a different machine) to my existent setup (Elasticsearch , Logstash , Kibana on same machine) .

  1. So do I need to again make indices using CURL on the newly added node ??? Or Elasticsearch will itself do the thing ?? What I studied , it says it will automatically do the desired thing on its own .
  2. Further lets just say I need to add an index in future . So do i need to add it on the same old machine on which i had added indices so far (using curl -XPUT 'http://localhost:9200/index_name ')or i need to add the index to newly added node using CURL as curl -XPUT 'http://newly_added_node_ip:9200/index_name'
  3. Will my Kibana automatically get aware of newly added node and provide me the result of indices present on the newly added node(s) ???
  1. Yes, it'll take care of that for you.
  2. You can create indexes on any node.
  3. Yes.

Note that one should be careful when running two nodes as a split brain situation can occur. You should either have three nodes (and set discovery.zen.minimum_master_nodes to two) or just make one of the nodes master-eligible. There's a lot about running a cluster in the documentation; I recommend you read it.

And obviously, if Logstash is the bottleneck in your system it won't help to add an extra ES node.

@magnusbaeck Now I am getting the logs after a lot of delay because of this bash script probably .
1.Do you too think it is because of the bash script ??? This is because I haven't yet got the logs of 03/02/2016 . Logstash is still parsing the logs of 02/02/2016 .
2. So how can I scale Logstash now ??
3. One option is using more than one logstash . Is it a good idea ?? I don't think so because delay will be there too.
4. Will increasing the number of processors will scale our Logstash ??

@magnusbeack What can be the reason for increasing memory consumption ???
can it be because of -

  1. any kind of memory leak ?
  2. any logstash configuration issue ??
  3. or something else
    How can I find it ??
  1. I can just guess and make suggestions. You'll have to debug this on your side.
  2. Looking into the number of filter workers (or the equivalent in Logstash 2.2+) could help throughput if the CPUs aren't saturated. Inefficient filters can also reduce performance. Another option is to run multiple Logstash instances that pull from a broker. Or you can put a load balancer in front of Logstash. But the first order of business is looking into where the bottleneck is. How do you know Logstash is the problem?
  3. See above.
  4. Yes, additional CPU power is likely to improve throughput regardless of where the bottleneck is (Logstash or ES). Note that I say "likely".

@magnusbaeck I doubt Logstash is the problem with high probability because when I see it on the console i can see Logstash still parsing the logs of 02/02/2016 and today date is 04/02/2016 . So the problem is with Logstash for sure don't you think so ? So can there be a bottleneck on ES side ?? I mean ES can show logs only when it gets and if it is not getting the logs how can it show?? So how can ES be the bottleneck ??

I doubt Logstash is the problem with high probability because when I see it on the console i can see Logstash still parsing the logs of 02/02/2016 and today date is 04/02/2016 . So the problem is with Logstash for sure don't you think so ?

I do think Logstash is the problem, but you're not interpreting the evidence correctly. Your pipeline looks like this:

file on disk => Logstash => Elasticsearch

Regardless of whether the bottleneck is Logstash or ES the effect early on in the pipeline will be the same. If ES is too slow it'll push back against Logstash, and since Logstash doesn't have a buffering mechanism to speak of it'll scale back the log reading.

So can there be a bottleneck on ES side ?? I mean ES can show logs only when it gets and if it is not getting the logs how can it show?? So how can ES be the bottleneck ??

Yes, of course ES can be a bottleneck.