Use Logstash persistent queue, or use Redis/Kafka/RabbitMQ? (On a server that's offline most of the time)

dreamspy · March 28, 2020, 5:23pm

Hi there

I'm in the situation where I'll have robots/machines that are offline most of the time, since they'll be out in the field doing measurements in situations where one can't have access to the internet.

But when they are docked for charging they will have access to the internet again, where they will offload measurements, and metrics to Elasticsearch. And at the moment we're planning to use Beats and Logstash for that, and the persistent queue functionality of Logstash.

Since it is of utter importance that these measurements don't get lost, I would like to verify with you guys if the setup I'm thinking of, is robust enough, and if this is the preferred way of doing queuing. I've heard about using queuing software like Redis, Kafka or RabbitMQ, but I'm not sure they are any better than using the persistent queue built into Logstash, which even stores the queue on disk, so it shouldn't loose the queue in case of a power failure.

I can easily think of scenarios where the logs/measurements can get lost:

The computer on the robot crashes, thinks that it has sent the logs, but hasn't
The program crashes, same result
Internet connection is down for to long and the logs build up, filling memory/disc space
...and probably endless more possibilities that I'll never be able to think up

So the plan looks like this:

On the robots I'll be running Beats (Filebeats etc) to read log files which are sent to Logstash on the same computer which will filter and tailor the data.

I then need to both store a local copy of the tailored data on the robot, and send it to a centralized server running Elasticsearch.

So I was thinking that Logstash would take care of writing the data to a local database on the robot, possibly Elasticsearch. And also fill up the persistent queue since it'll be offline most of the time. And once it regains internet access, the queue will be emptied to a remote server on AWS running Elasticsearch.

Does this sound about right? Am I on the right track or should I use some other methods?

dreamspy · March 28, 2020, 5:29pm

I would like to add that the data that'll be stored in the DB on the robot itself is only a small subset of the whole data that'll be stored on the AWS server.

So the idea of writing only to the local database, and then using something to sync that DB to the AWS one, would not work in this case.

system · April 25, 2020, 5:29pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Best queue for devices that are mostly offline Elasticsearch	5	449	April 28, 2020
A question on Persistent Queues Logstash	1	259	March 3, 2020
Logstash persistent queueing Logstash	3	793	July 6, 2017
How to prevent logs missing Elasticsearch	7	1000	March 27, 2018
Queue persistence - which is the most suitable solution Logstash	1	385	March 7, 2018

Use Logstash persistent queue, or use Redis/Kafka/RabbitMQ? (On a server that's offline most of the time)

Related topics