From what I have read so far, Beats executes a cron job based on file-based configuration parameters.
Is there a way for a Beat to read from the ElasticSearch database to determine what to do - i.e. what URL to crawl, for example
Are the jobs that are performed by a Beat queued in some sort of message queue or spooler (as I have seen in the architecture diagram for FileBeats), or is this something that has to be introduced separately?
beats are no external driven (cron-job like) job-processors.
Every beat is a specialised applications(shippers) used to collect "events". Some beats might internally decide to schedule re-occuring tasks, but this is an internal implementation detail. E.g. filebeat is tailing your files reading new lines the moment new lines become available, but uses threads with timeouts to scan the filesystem for new files. packetbeat analysis life network traffic. metricbeat executes (internal) periodic tasks, ...
filebeat does not support ftp. Trying to fail files via ftp doesn't sound like a fun task to me. Normally users install filebeat on the edge machines itself. You can send to Logstash or directly to Elasticsearch (e.g. using Ingest Node).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.