Curator - Reindex does not complete on first run

When run a reindex action in curator 5.5 for the first time, it doesn't reindex all docs.
Only 998 docs. If I then run it for a second time it finishes the rest of the 870k docs.
Logs do not show an error.

reindex.yml:

---
actions:
   1:
     action: create_index
     description: 'oinblogs: Create Weekly index.'
     options:
       disable_action: False
       continue_if_exception: True
       name: '<oinb-{now/w-3w{YYYY.ww}}>'
     extra_settings:
       settings:
         number_of_shards: 6
         number_of_replicas: 1

   2:
     description: 'oinblogs: Reindex oinblogs- daily indices into week index after x day'
     action: reindex
     options:
       disable_action: False
       wait_interval: 30
       max_wait: -1
       request_body:
         source:
           index: REINDEX_SELECTION
         dest:
           index: '<oinb-{now/w-3w{YYYY.ww}}>'
     filters:
     - filtertype: pattern
       kind: prefix
       value: oinb-
     - filtertype: age
       source: name
       direction: older
       timestring: '%Y.%m.%d'
       unit: days
       unit_count: 16

Did some else encounter this problem?

You should try using slices, which uses the sliced scroll functionality in Elasticsearch to more effectively parallelize. You're probably hitting a hot-thread with your single threaded approach.

When using slices it continues untill 2.3k docs and then stops.
Also a second time run doesn't work.

The strange thing is that when not creating the index with the create_index action, it's created by the reindex action and doesn't stop. :thinking:

It looks like it has something to do with the number of shards.
6 and 7 shards per week index do not work.
5 works like a sharm.

I have a 3 node cluster
16 gb per elasticsearch server

Daily index:
Total: 24.4 GB
Primaries: 12.2 GB
Documents: 32.5m

I want to put them into a weekly index after 1 week, to reduce the number of open indices/shards.
We want to keep 1,5 months of data.

Is there a max number of shards per index?
Is it necessary to reindex the daily into weekly?
What is the adviced number of shards for the day index and for the week index?

Oh! It's probably a timing issue with the cluster state. You don't need to pre-create the index. The reindex will do that for you.

Additionally, there is absolutely no reason to have a 6 or 7 shard index on a 3 node cluster. You should not use more shards than data nodes, generally speaking. In fact, best practices is to not use more shards than is necessary at all. You can easily fit 5GB to 50GB in a single shard, so multiple shards is overkill. If you're reindexing down the number of shards, you should probably go to 1 shard per index. Have you considered using shrink instead of reindexing down?

If I don't pre-create the index, it will create an index having 5 shards. Which is not evenly spread over the cluster nodes. I though this was adviced.

My week indices will grow to:
Total: 7 * 25 = 175 GB
Primairies: 7 * 12 = 84 GB

So I could configure my weekly index to have 3 P shards.
Should I configure my daily index to have 1 P shard and 2 replica's?

I not per se trying to reduce the number of indices/shards.
I'm trying to configure it in the best way.

How is 7 balanced?

You should use index templates to set the number of shards for new indices, as well as setting the appropriate mappings.

Thanks for the advice, I will also look into Shrink.

7 as in 7 days in a week.

This is why I asked about the number 7. You still wouldn’t need a shard per day. And don’t count the total, only the amount per shard, like with your primaries. If you have 12.2G per day, you could go 4 days in a single shard. Your weekly index should only need 2 shards, to keep below the 50G/shard recommendation.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.