Reindexing/Updating Elasticsearch using Logstash on Jenkins

I would like to automate the process of updating the elasticsearch with latest data on demand and secondly, recreating the index along with feeding data using a Jenkins job.

I am using jdbc input plugin for fetching data from 2 different databases (postgresql and microsoft sql). When the Jenkins job is triggered on demand, the logstash should run the config file and do the tasks we would like to achieve above. We also have a cronjob running on the same sever (AWS) , where we would be running the logstash job on demand. The issue is, the job triggered via Jenkins, starts another logstash process along with the cron job running logstash already on the AWS server.

Is there a way to achieve this scenario? Is there a way to terminate the logstash running via Jenkins job or if there's some sort of queue that would help us insert our on demand logstash requests?

PS: I am new to ELK stack, I can try to reframe my question if it doesn't make sense.

Thanks!

Why not use the same Jenkins job for both tasks? You can easily restrict the job to run not concurrent builds. The scheduling can be done either within Jenkins itself or a script run from crontab can fire off a Jenkins build.

@magnusbaeck Thanks for your reply. I am going to take a fresh look at this problem and do stuff from scratch. Please advice on the following :

  • What's the recommended way of running Logstash - as a service or using -e flag with the config file? We are using AWS centos box for running ElasticSearch and Logstash.

  • We would like to manage 3 environments, one for each dev, qa, prod. Each of these environment will have 2 config files, fetching data from 2 different data sources and feeding it to elasticsearch. We were thinking of running 3 different logstash processes, one for each environment. Does it sound like a good approach?

  • With above step, we would like to use scheduler that can be configured in logstash config file instead of using crontab or any other scheduler. Does the scheduler config ensure that new data gets fed into elasticsearch periodically?

Thanks

What's the recommended way of running Logstash - as a service or using -e flag with the config file?

Normally one wants to run Logstash continuously (the only good exception would be if you're using Logstash for batch processing, something it's not geared towards) and then running it as a service is the obvious choice.

We would like to manage 3 environments, one for each dev, qa, prod. Each of these environment will have 2 config files, fetching data from 2 different data sources and feeding it to elasticsearch. We were thinking of running 3 different logstash processes, one for each environment. Does it sound like a good approach?

Three environments of what, an application? Or the log processing pipeline? In the former case I probably wouldn't bother with separate Logstash instances, but it depends on the circumstances. How closely tied is the application and Logstash? Will you ever have dependencies between the two?

With above step, we would like to use scheduler that can be configured in logstash config file instead of using crontab or any other scheduler. Does the scheduler config ensure that new data gets fed into elasticsearch periodically?

Yes.

Normally one wants to run Logstash continuously (the only good exception would be if you're using Logstash for batch processing, something it's not geared towards) and then running it as a service is the obvious choice.

We are not using Logstash for batch processing. At the same time we would want Logstash to process records that are added to database (postgresql and microsoft sql) periodically and update Elasticsearch with the delta. Hence, we would like to use "schedule" in the config files and place those config files in conf.d folder so that they get processed for latest updates to db. Can this be achieved by running elasticsearch as a service?

Three environments of what, an application? Or the log processing pipeline? In the former case I probably wouldn't bother with separate Logstash instances, but it depends on the circumstances. How closely tied is the application and Logstash? Will you ever have dependencies between the two?

Three environments (dev, qa, prod) of database (postgresql and microsoft sql) that gets fed into Elasticsearch (same set of indexes for above three environments) using Logstash. The Elasticsearch indexes are used by an application running in dev,qa and prod. So, there's an indirect dependency between the two as in if Logstash process doesn't feed Elasticsearch on timely basis then application will not show updated changes in the underlying databases. Please recommend an approach for handling this use case.

Hence, we would like to use "schedule" in the config files and place those config files in conf.d folder so that they get processed for latest updates to db. Can this be achieved by running elasticsearch as a service?

You mean Logstash? Yes.

Three environments (dev, qa, prod) of database (postgresql and microsoft sql) that gets fed into Elasticsearch (same set of indexes for above three environments) using Logstash. The Elasticsearch indexes are used by an application running in dev,qa and prod. So, there's an indirect dependency between the two as in if Logstash process doesn't feed Elasticsearch on timely basis then application will not show updated changes in the underlying databases. Please recommend an approach for handling this use case.

You can go either way. Running separate Logstash instances would separate the environments more but at the cost of increased complexity. Since you'll still be able to have unique Logstash configurations for each environment it's not clear what the benefits of separate instances would be.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.