Streaming custom data source to Elastic

ajhstn · October 8, 2019, 4:58am

Hello - This is a question around best practices for streaming custom data to Elastic.

Currently I have a script to pull 1 event from my API and then post it to /indexname/_doc. This works great, but now i am asking, what is the best way to run this continuously?

I thought that if i pull the data down (in a script) and save it to a JSON document on disk, then configure Filebeat to ship that data to Elastic is one option, but i'm wanting to ask the advice of others before i try take this route.

Obviously there is the Bulk API that could be coded straight into the script, however as Filebeat also uses the Bulk API under the covers, why re-invent the wheel.

Thank you for feedback.

dadoonet · October 8, 2019, 8:27am

What kind of script it is? Java based? Pure unix script?

There are pros and cons in both methods but you surely know it.

Using filebeat is very good because indeed you can benefit from the retry mechanism in case of failure on elasticsearch side for example.
On the other hand, you will have to write/update a file and have to think about cleaning the old files.

My guts feeling is that if I'm using Java for example, I'd use the bulk processor and accumulate documents in it and let it flush to elasticsearch. But in that case you need to deal with errors "manually".

One other solution is to write your documents to a message queue system (like Kafka, Redis, RabbitMQ...) and use Logstash to read the event queue.

My 0.05 cents. Happy to hear from others.

ajhstn · October 8, 2019, 8:42pm

Hi David, thank you for the feedback, that is helpful. I may try Filebeat first, see how that goes and what the management side of things will look like.

The script itself is Windows Powershell, but this is more a POC, and will likely move this to python or use the elastic nodejs client.

From an architecture perspective, is running a script either continuously or every 30/60 seconds in a cron a good use case for containers?

system · November 5, 2019, 8:51pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash vs building a custom data stream Elasticsearch	2	1317	June 11, 2017
Update requests with native java scripts Elasticsearch	1	440	February 8, 2017
Custom Beats vs. Scripts Beats	4	789	October 8, 2016
What is the recommended way to stream Elasticsearch index data Elasticsearch	2	97	March 28, 2024
Continuously copy specific documents to another index? Elasticsearch	9	1418	December 16, 2020

Streaming custom data source to Elastic

Related topics