I am trying to understand exactly how beats work under the hood. From what I have read, they are mini daemons that run on a client that will ship data to an ElasticSearch endpoint.
In our current platform, we have a Windows service that read data endpoints from a database and fetches data to put into the database.
What if I have a database list of endpoints that I want to probe data from - say RSS a list of RSS feeds, or even hardware. Would I use a custom-written Beat to get the database list and iterate through it?
[quote="baden0x1, post:1, topic:69294"]
What if I have a database list of endpoints that I want to probe data from - say RSS a list of RSS feeds, or even hardware.[/quote]
You could, or you could use the HTTP (poller) input from Logstash.
But sounds like metricbeat will do the second one, but depends on what you want to read from it.
Ok, please help me understand the difference between using a Beat and Logstash.
The list of endpoints is dynamic and can change between intervals.
I would like to start with HTTP - this way I can start with something simple like RSS feeds and do some parsing with that. Is there a built-in spooler if doing this through Logstash and Beats, as the list of sources may grow to be quite large? Are these jobs single or multi-threaded? The latter would be optimal.
So, I would like the service or cron job (non-Windows speak, right?) to spin up on an interval, read from the database, based on some logic, select what to crawl, pull it in, do some processing and add documents to an index.
In the future, data being read may be over TCP or even an FTP file to parse - that's much later, but something to keep in mind.
I'm just trying to understand the most efficient way of doing this with the ES stack, so the more detail provided, the better. I have an underlying feeling that this whole thing is going to be so much simpler to implement with ES than it was using .Net and Windows services.
A beat does one thing, ship data. It doesn't do transformations or enhancements and the inputs and outputs it can send to are limited.
So it ships said data and does it efficiently and simply.
LS is kinda like an ELT tool, it can ship data from and to multiple different places, and can do complex transformations.
Thank you, I have taken a look at http_poller. The issue that I see with this is that the list of URLs needs to be known to place in the config file. My list of ULRs will be dynamic and can change whenever the the job spins up again.
In the documentation, I don't see where it could read a list of URLs from the db every every time it spins up.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.