I'm new in Elasticsearch.
I have Logstash configurations with postgresql and mongodb as data source (data.postgresql.conf, data.mongodb.conf), my problem is that I have to launch the logstash configuration of postgres to initialize the mongo data ( in my statement request from jdbc I have a request like this : << SELECT table.* , 0 as count ... >> then in my mongoDB query << SELECT COUNT(*) ... >> and then replace the count in the same index, isn't there another way to do this more efficiently, and synchronize the 2 databases on Elastic?
given the two disparate systems, it may be worthwhile to use a client to pull and organize the data.
for instance, using the Python client to call Postgres and get the initial data set, then call MongoDB to enrich, then bulk into Elasticsearch.
this way you can keep control of the transactions and order of operations.
what you're currently doing isn't unheard of, but sits more in the realm of "eventual consistency" when it comes to the data flow and can become burdensome when you have to "initialize" something in order for the data to flow correctly.
If I understand correctly, I have to retrieve data from PostgreSQL and mongoDB using SQL queries, then transform it, then ingest it into ES (bulk) using the Python ES library?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.