When run a reindex action in curator 5.5 for the first time, it doesn't reindex all docs.
Only 998 docs. If I then run it for a second time it finishes the rest of the 870k docs.
Logs do not show an error.
reindex.yml:
---
actions:
1:
action: create_index
description: 'oinblogs: Create Weekly index.'
options:
disable_action: False
continue_if_exception: True
name: '<oinb-{now/w-3w{YYYY.ww}}>'
extra_settings:
settings:
number_of_shards: 6
number_of_replicas: 1
2:
description: 'oinblogs: Reindex oinblogs- daily indices into week index after x day'
action: reindex
options:
disable_action: False
wait_interval: 30
max_wait: -1
request_body:
source:
index: REINDEX_SELECTION
dest:
index: '<oinb-{now/w-3w{YYYY.ww}}>'
filters:
- filtertype: pattern
kind: prefix
value: oinb-
- filtertype: age
source: name
direction: older
timestring: '%Y.%m.%d'
unit: days
unit_count: 16
You should try using slices, which uses the sliced scroll functionality in Elasticsearch to more effectively parallelize. You're probably hitting a hot-thread with your single threaded approach.
I want to put them into a weekly index after 1 week, to reduce the number of open indices/shards.
We want to keep 1,5 months of data.
Is there a max number of shards per index?
Is it necessary to reindex the daily into weekly?
What is the adviced number of shards for the day index and for the week index?
Oh! It's probably a timing issue with the cluster state. You don't need to pre-create the index. The reindex will do that for you.
Additionally, there is absolutely no reason to have a 6 or 7 shard index on a 3 node cluster. You should not use more shards than data nodes, generally speaking. In fact, best practices is to not use more shards than is necessary at all. You can easily fit 5GB to 50GB in a single shard, so multiple shards is overkill. If you're reindexing down the number of shards, you should probably go to 1 shard per index. Have you considered using shrink instead of reindexing down?
If I don't pre-create the index, it will create an index having 5 shards. Which is not evenly spread over the cluster nodes. I though this was adviced.
My week indices will grow to:
Total: 7 * 25 = 175 GB
Primairies: 7 * 12 = 84 GB
So I could configure my weekly index to have 3 P shards.
Should I configure my daily index to have 1 P shard and 2 replica's?
I not per se trying to reduce the number of indices/shards.
I'm trying to configure it in the best way.
This is why I asked about the number 7. You still wouldn’t need a shard per day. And don’t count the total, only the amount per shard, like with your primaries. If you have 12.2G per day, you could go 4 days in a single shard. Your weekly index should only need 2 shards, to keep below the 50G/shard recommendation.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.