We have requirement to load data from Hadoop to elastic search index through Spark Job.
The condition for index to be rolled over is number of document.
Below are the sample example i am doing for achieveing based-
Issue which we are getting, after adding one document, until i run manually the condition statement , roll over to new index is not happening.
As a result, in one index many documents are getting loaded, Which is not we want.
After running the condition statement, then roll over to new index is happening for the new set of document.
Please suggest, how to achieve roll over in my scenario.
please let me know, if you need any other details.
Thanks & Regards,
rollover is not happening automatically. You need to call the API as frequently as you want to see if the rollover can be executed or not.
You can use curator for that: https://www.elastic.co/guide/en/elasticsearch/client/curator/current/rollover.html
Thanks for your kind reply.
As per the example mentioned above, If i have set the condition as max_doc=1, Now if the data load is done through spark job which contains 10 document. Then all 10 document is getting inserted to the index.
Could you please help me in understanding, How can the condition be executed in the runtime when data load is in process to ES index.
max_doc=1 is not a normal value for rollover so this example does not make sense in production.
It should be something more like
Which means that if the number of documents in your index is more than 1m when you call the rollover API, then a rollover will happen.
If you index let say 1000 documents per second, then calling every minute the rollover API means that you will have index with a number of documents most likely between
Thanks for making me understand through suggested implementation.
Can you please let me the ways to implement curator as suggested by you--
Do we need any admin support for the implemenation or we as a developer can implement curator.
Thanks & Regards,
It's more an admin tool IMO but I'm leaving the question to @theuntergeek.
What do you mean by "admin support?"
Curator can connect from anywhere to a cluster, provided there is a network path and no firewalls in between. If you have deployed security in your cluster, like the need for SSL certificates or user authentication, then you would need to have that information to be able to connect.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.