Hello everyone,
Do you have any idea how can I process to synchronise my websites (built with Wordpress, Drupal, Joomla...) with my Elasticsearch?
Thank you in advance.
Hello everyone,
Do you have any idea how can I process to synchronise my websites (built with Wordpress, Drupal, Joomla...) with my Elasticsearch?
Thank you in advance.
You can may be try to find a webcrawler but IMO it would be too much generic.
I'd use dedicated connectors.
For example, here is an article for Drupal which can help you: http://redcrackle.com/blog/configuring-drupal-elasticsearch-facet-search-functionality
I hope this helps.
Thank you @dadoonet for your answer, I will read this article. Do you think that is better to use webcrawler/dedicated connector or do something directly in the database (trigger/transaction/logs)?
I always prefer sending the data to elasticsearch within the same "transaction" which saves your data to the database.
I wrote an article about it: http://david.pilato.fr/blog/2015/05/09/advanced-search-for-your-legacy-application/
Another approach could be to reindex all your system every night in another index and then switch the alias but it's far away from real time. I mean that it works well if you don't care about updates in your DB during the day.
The closer you are to the application which is generating the data, the better.
So if you are using Drupal and have a connector for that, you should use it.
Same for other systems.
If you can't do that, because there is no way to extend the application, then yes you can use logstash or elasticsearch-jdbc for that. Note that dealing with updates and deletes could be hard.
HTH
@dadoonet Your article is very interesting
I would like to set up this process :
Do you think it is a good idea?
I have one question : How can I know when Logstash has finished to collect the data?
Thank you in advance.
Do you think it is a good idea?
Yes.
How can I know when Logstash has finished to collect the data?
I think that Logstash will exit after the end of the job.
Look at the documentation: Jdbc input plugin | Logstash Reference [8.11] | Elastic
You can periodically schedule ingestion using a cron syntax (see schedule setting) or run the query one time to load data into Logstash.
And Jdbc input plugin | Logstash Reference [8.11] | Elastic
So if you don't set schedule
your logstash job will end after having processed all the data.
Thank you again @dadoonet for your response, I checked and it's true, Logstash exits at the end of the job
I think that is one of the best solutions in order to have zero downtime, the other solution as you have mentioned would be to execute Elasticsearch's commands directly from the application at the same time than the database's transactions (Mysql, Oracle...).
Thank you again for your advice @dadoonet
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.