Our team has a deployment in the elastic-cloud and we have purchased a standard cloud package. Now, I have a requirement to clear the indices older than 30 days due to storage issues. I tried to connect the cloud-server using ssh using gitcmd in windows, which was not working.
I second @warkolm’s point here. If you’re using a recent release of Elasticsearch, which I would expect since you’re using Cloud, ILM is the way to go. If you have an edge case that ILM doesn’t address (yet), then you can still install Curator at a remote location and point it at your Cloud instance.
Thanks Aaron.
Thats innovative, Curator at remote location and pointing to the Cloud.
I shall give it a try and keep you posted.
If you have any sample on this, can you please it, will be helpful.
I tried the curator installed in a local machine and pointed it to the ElasticCould.
The Curator is running very long time, without any errors.
This worries me, as I am not sure whether the curator is successfully running or it got interrupted and not displaying any errors.
Note: I also tried with invalid credentials, which throws error within few minutes.
The below curator is to reindex about 30 indices into a single index:
I can understand that the indices were all little heavy with 400000+ documents approx, but will that take this long(17+ hours and still executing) to complete?
I also tried to reindex a single index with just 479 documents and it is working for more than 5 hours.
Please find the screen shots for the details:
So, a reindex is not necessarily a rapid process, especially if you have other ingest events going on.
Additionally, without specifying otherwise (by setting a slices option), a reindex is a single-threaded operation, making it potentially even slower if other indexing or heavy I/O traffic is happening.
Was this request made while:
Regular ingest was happening
The big reindex was happening
If so, you added a third or fourth I/O heavy operation as a single thread, which could explain the slowness.
How large is your cluster (how many data nodes)?
On a separate note, Is there any way you can change metricbeat from doing daily indexing to using rollover indices? It would remove the need to reindex altogether by only rolling over to a new index when a total size in GB or number of docs or age of data is reached. It is the preferred solution.
Our usecase is, get the indices backed up in a local elk server and the delete the indices, it should automatically happen as well.
The Reindex Curator is just a trial as we don't want to run a curator with delete operations since things were not sure, as this is my first curator. I wanted to confirm that the curator which I wrote works and I didn't know that it will be such a long process, working with reindexing.
Once the curator started, is it possible that we know that the process got kicked off or something is happening, any logs or something that tells me that I have something running behind?
I was not sure about whether any regular ingest was happening but no, big reindex was definitely not happening.
Our cluster has 5 instances with 1 master node and 2 data nodes, rest one each for APM and Kibana.
The rollover indices are not in our options since I dont want to risk it. Our data's are important, so we just want to back it up (elasticsearch-dump) and then delete them based on the request.
So I want to create a curator, which will first complete the backup/export and delete the indices.
One more thing, the curator(reindexing) is still running (for more than 27 hours)
Okay, so that's 3 data nodes which are also master eligible (which is a bit different from 2 dedicated data nodes and a single master—you can see that all 3 are data, 2 are listed as master eligible, and 1 is master). Also, these data nodes having only 8G of RAM means this is a relatively small cluster in terms of CPU and memory.
Since it is not a large cluster, CPU and I/O are going to be a bit limited. A reindex is definitely going to go slower on this cluster, and slower still without setting slices to a higher number (if unset, the default is 1, meaning 1 single-threaded query feeding the reindex).
Also, the speed of a reindex isn't tied to Curator itself. Curator merely starts the process with a _reindex API call, which is then managed locally by the cluster itself.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.