Hello everybody.
I would like to discuss with you about something I have notice while I was doing some tests.
I had an application sending messages to a machine where I have a logstash instance and REDIS, and another machine which has a logstash instance with ES.
In the logstash instance placed in the first machine I only redirect the data to REDIS without using any filter, only input{...} and output{...}. I get the tags and the timestamp of the message when it comes to the second machine, and I put in ES an index per day.
I had a problem with ES and I had to erase the index of a day ( December 26th ). When I restored everything I started to receive data in the index of the day I have erased ( December 26th ) instead of the present day (January 4th). It was strange for me because all the data I received had the timestamp of December 26th, and I received 1.82 GB of data! So, it worked like if it had an storage queue in the 2nd logstash.
In logstash we have a thread for every part, input, filter and output, and they are comunicating with each other using a queue, isn't it? How could I control the size of this queue? Is there any parameter to configure it?
An event is stored in the Elasticsearch index that corresponds to the event's timestamp (the @timestamp field), so it's completely normal and expected that events from Dec 26 are stored in the Dec 26 index even if they're processed on Jan 4.
Logstash has two internal queues with room for 20 events each. This isn't configurable.
Hi @magnusbaeck
Thank you very much for your answer.
Yes, but I erased the 26th December index before restarting the system, so where were they stored if I didn't have that index ?
So the question isn't really "why did the events end up in the Dec 26 index" but actually "why did Logstash read Dec 26 events at all". It seems your log source still had unprocessed Dec 26 events. Without further clues about where your events came from it's impossible to tell.
What is clear is that Logstash itself has no 1.8 GB internal queue and it has no feature to detect deleted indexes and reprocessing the data.
Thanks @magnusbaeck
The log source is a simple UDP transmitter. So...It's an UDP transmission, it is not worried about the destination availability, and I have no queue for this transmission.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.