(1) : can we move the data from one index to another : - I have some data in one index called latest-index and after some action triggered from back-end , I want to move my data from latest-index to archive-index . So currently I am getting all the data from latest-index and then pushing the whole data into archive index and then deleting the documents from latest-index , but this is a overhead because data is already available in my es cluster , and only what I need to do is move the data.
is there any way possible for this ?
(2) : For get query in elastic , default search count is 10 (we can see only 10 result default ). and max size is 10000000 (10 million) , can it be dynamically possible that if I have 400 records i will get 400 only and if i have more than 10 million then I will get it accordingly, so I do not have to specify the number of the documents in get query .
Renaming an index is not the way to go and it's actually not supported. If you really want to do that, you have to reindex.
The best practice is to use an alias on top of your live indices.
When you want to change the alias from A to B, just switch the alias and you're done. Your users won't see it.
Also, please not that deleting 100% of documents from an index or a big part of it is something you should avoid if possible. Again, prefer having time based indices or any other split based indices policy and an alias on top of them.
Then, just remove the index which contains old documents and you're done. Removing by query is not a free operation.
If you want to move an index from big machines to smaller machines, that's definitely something you can do.
If you want to archive an old index, use snapshot and restore and then remove the old index.
If you want to keep your index in your cluster but don't want to use it, just close the index. It will still consume disk space but no memory or CPU anymore.
(2) If you need to extract a lot of results, don't use size but use scroll API.
You can't move data without reindexing, and that's costly. Try to organize your data in such a way that such moves won't be necessary. It's hard to make concrete suggestions without more background information.
You can always specify a bigger size than the actual size of the results. However, if you want to retrieve more than maybe a few thousand documents (depending on their size, obviously) you should use scan & scroll instead.
The have some logs that I am pushing into elasticsearch , and I have a id along with log.
So each time I am checking my current log id with the previous id .
(I have two indexes available : latest-log(which always refers to latest-logs) , archive-logs(for all the previous old logs)
if the id has changed that means that new documents are started coming and now time for archive all the records available in latest index to archive index. So currently I am getting all the data from latest-index and then pushing the whole data into archive index and then deleting the documents from latest-index .
Can't you just make the id part of the index name? If you want a stable index name that always refers to the current set of latest logs, use an index alias that you reconfigure to point to the index you're currently indexing into.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.