Can someone please explain the Beat architecture - Can a Beat be used to pull data from a list of sources?

baden0x1 · December 16, 2016, 1:43pm

I am trying to understand exactly how beats work under the hood. From what I have read, they are mini daemons that run on a client that will ship data to an ElasticSearch endpoint.

In our current platform, we have a Windows service that read data endpoints from a database and fetches data to put into the database.

What if I have a database list of endpoints that I want to probe data from - say RSS a list of RSS feeds, or even hardware. Would I use a custom-written Beat to get the database list and iterate through it?

How can this be accomplished with ELK?

warkolm · December 17, 2016, 1:09am

[quote="baden0x1, post:1, topic:69294"]
What if I have a database list of endpoints that I want to probe data from - say RSS a list of RSS feeds, or even hardware.[/quote]

You could, or you could use the HTTP (poller) input from Logstash.
But sounds like metricbeat will do the second one, but depends on what you want to read from it.

baden0x1 · December 17, 2016, 1:48pm

Ok, please help me understand the difference between using a Beat and Logstash.

The list of endpoints is dynamic and can change between intervals.

I would like to start with HTTP - this way I can start with something simple like RSS feeds and do some parsing with that. Is there a built-in spooler if doing this through Logstash and Beats, as the list of sources may grow to be quite large? Are these jobs single or multi-threaded? The latter would be optimal.

So, I would like the service or cron job (non-Windows speak, right?) to spin up on an interval, read from the database, based on some logic, select what to crawl, pull it in, do some processing and add documents to an index.

In the future, data being read may be over TCP or even an FTP file to parse - that's much later, but something to keep in mind.

I'm just trying to understand the most efficient way of doing this with the ES stack, so the more detail provided, the better. I have an underlying feeling that this whole thing is going to be so much simpler to implement with ES than it was using .Net and Windows services.

warkolm · December 17, 2016, 8:44pm

A beat does one thing, ship data. It doesn't do transformations or enhancements and the inputs and outputs it can send to are limited.
So it ships said data and does it efficiently and simply.

LS is kinda like an ELT tool, it can ship data from and to multiple different places, and can do complex transformations.

baden0x1 · December 18, 2016, 6:34am

Thank you, I have taken a look at http_poller. The issue that I see with this is that the list of URLs needs to be known to place in the config file. My list of ULRs will be dynamic and can change whenever the the job spins up again.

In the documentation, I don't see where it could read a list of URLs from the db every every time it spins up.

warkolm · December 18, 2016, 11:28pm

You'd have to get the URLs from somewhere and then put them in the config file.

baden0x1 · December 19, 2016, 3:10pm

Thank you. This means this is not dynamic and not an option for me, as I am not about to write a text parser/writer, and restart the Beat. Hell's no.

system · January 6, 2017, 1:44pm

This topic was automatically closed after 21 days. New replies are no longer allowed.

Topic		Replies	Views
Can a Beat read from the ES database and is message queuing built into the Beats architecture? Beats	4	946	January 7, 2017
Custom Beats vs. Scripts Beats	4	817	October 8, 2016
Can beats pick data from database and send to elasticsearch Beats	2	6062	July 28, 2017
Is there a Beat that reads from DynamoDB? Beats	7	1252	December 8, 2016
How to ingest RSS and physical device data into ElasticSearch? Elasticsearch	4	1330	January 14, 2017

Can someone please explain the Beat architecture - Can a Beat be used to pull data from a list of sources?

Related topics