Dynamic sync between Mongo to Elastic search with Kafka and Logstash spring application

in this sequence we want to talk about Data base Connectivity especially the dynamic sync between mongo and elastic search with the aid of Apache Kafka and Logstash for ingesting document beside that we want optimize the mapping structure and multi field mapping as our desire, we will use two different Spring application to observe some concept about search algorithms and finally we have online Kibana dashboard to monitor every single event from mongo, you can find all related codes in github, as bones we designed the all 3 forms of queries and their integration Jpa-Dsl-ElasticksearchRepository.

(post withdrawn by author, will be automatically deleted in 24 hours unless flagged)

1.The First is to create an application based on your needs on Mongo , you will need to add Kafka maven dependency and two classes as follow , one for the driver which contains Beans of Kafka template ,Message producer and Failure and success handler (Optional) , the other class contains topic builder . beside this you need to set up your application properties about Kafka. with completing this steps you have a powerful stream processing and very suitable pipeline for beginning the data ingestion onto the next step.

2.On the second Stage we will write the Logstash Configuration file to ingest document into the elastic , this step was bit challenging for my self too because there where no Mongodb input plugin for Logstash, I left at two choice one were using Java database connectivity( Jdbc ) and the other was using Kafka therefore i decided to use Kafka how ever, I've already write the very stable set ups and design for Jdbc case but we will discus about later in an other topic.
Our Logstash configuration Consist of 3 part as usual which you can find it in github repository i will not explain the filters and procedure in configuration because they are very trivial how ever you can find their usage and application with bit of search, as the final point on this stage just for the case of having back up ,I've added the file output filter for saving every event and documents inside the Csv file.
As a bonus I've also added some other Logstash configuration that may help you in other application please note that the suitable configuration for our case in hand is logstashKafka.conf , that's it we are done on this stage!

3.On the third stage, after we have successfully added our Kafka related class to our spring application we active the Zookeeper and Kafka broker beside that we will add a consumer to the topic which we created name as "First-Topic: too keep the track of stream processing result and event changes in the entrance of our Logstash pipeline , this is supper cool because we can have stream processing in Logstash also ! like Grok filter and get rigid of bottle neck or load balance issues very simple by aid of Kafka and it's queue structure.
After successfully activating brokers and consumer , we need obviously active our databases, Mongo and Elastic and elastic Gui ( Kibana )for observing changes , when we did this steps on this stage we will remove our Logstash configuration to bin folder of our installed directory of Logstash then executing it by famous command ./logstash -f logstashKafka.conf .

4.Now we need to create a design to publish our messages which is the pattern to do it in our application is completely optional, you can find a way to do it in github repository( createWarren methode ). regardless of how you design you Mongo based application and you pipeline design on Kafka we are done on this side the left is to design mapping (which i recommend multifield mapping ) you can run your application and test you designed Api for ingesting your object or event to Logstash with aid of any Api client such as postman.
Please note by default you can execute the related Api and observe that the index which we write in Logstash automatically created , but i would recommend you the design your manual analyzer and multifield mapping om your costume node.

5.We kept the procedure at the simplest way as it was possible, before ingesting the data to your index i would recommend you to design you client application-stream processing procedure-Logstash stream processing -shading and replication as your need and the mapping of your index, you can also create an other application based on Elasticsearch engine to observe some connectivity and differences between Elk (Apache Lucene) and Mongo.
Pleas feel free to make contact or navigate this post , any tips about it will be appreciate it !

Stay safe and best regards !

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.