Reindex daily-indices in a single command


(Ryan Grannell) #1

Hi,

I've recently had to update the type-mapping for ~30 indices, changing several fields from text → keyword [1].

I now need to do this again, and am wondering if there is an easier method than the one shown below.

We currently have the frustrating proceedure of:

  1. Posting our new index mapping matching the pattern 'primary-'
  2. Manually reindexing each "primary" index to a "swap" index with the new mapping applied
primary-2018-04-04 → swap-primary-2018-04-04
primary-2018-04-05 → swap-primary-2018-04-05
.
.
.
primary-2018-04-20 → swap-primary-2018-04-20
  1. Deleteing each "primary-*" index
  2. Post an alias mapping each "swap-primary-*" back to its "primary-*" name

Firstly, is there any equivalent single '_reindex' command that will at map each "primary-" index to a corresponding "swap-primary-" index for us in one go? We could automate this using the ElasticSearch API's, but I'd rather not if there's a built-in way of doing this.

Secondly, is it possible to change mappings for text fields → keyword fields 'in place'? I understand that in-place mapping updates are generally forbade, but the raw data needed for this field is present in _source. If not, I'd love to see in-place field reindexing based on _source be added in the future.

Thanks for any help you can offer,
Ryan

[1] We don't use the default template-mapping as we have a worryingly large field-count, so having a '.keyword' duplicate for each text field would double this count. We do control this count using "enabled:false" & similar tools; I mention it only to point out duplicating fields under a different mapping would not solve this problem for us.


(Zachary Tong) #2

Afraid not. You'll have to do the dance (new index with appropriate mappings, reindex, delete, swap alias) each time. It's pretty straightforward to automate or wrap in a script, but it's not something we provide out of the box at the moment.

Also not possible. It's a good thought and not immediately obvious why we forbid it. The main issue isn't the source data... like you said, it's available (unless you've disabled source, which could be a different issue).

The problem is that Lucene only knows how to map one data type to one field. During the "update in place", there's no way to overwrite the existing field's internal data structure with the keyword in place of the text.

It also poses a complicated question of how to handle searches that are going against a "mixed" index. Some data would need to be analyzed, other wouldn't. The keyword field may have normalizers that are not compatible with the text's analyzer. Term and doc frequencies will get super messed up during the transition and completely ruin scoring. Etc etc :slight_smile:

All in all, it's a very complicated situation that doesn't have a lot of immediately obvious "correct" answers, but does come with a lot of baggage and edge-cases.

It might be possible in the future as things continue to evolve, but at least for right now it isn't. Hope that helps provide a bit of background!


(Ryan Grannell) #3

Thanks Zachary, I appreciate the answers to both questions.


#4

I apologize for warming up this topic, but I'd like to ask you guys - what do you think of this:

I was able to reindex daily indices (net-YYYY.mm.dd) as following:

  • create a template with the mapping/settings for the new indices
    PUT _template/net-6
    {
     "index_patterns": ["net-6-*"],
     "order" : 1000,
     "settings": {
       "index": {
         "number_of_replicas": 0,
         "number_of_shards": 1
       }
     },
     "mappings": {
       "_doc": {
    ...
    
  • _reindes API with a painless script to create daily indices based on the template above
    POST _reindex?wait_for_completion=false
    {
     "source": {
       "index": "net-20*",
       "type": "net"
     },
     "dest": {
       "index": "net-6",
       "type": "_doc"
     },
     "script": {
     "source": """
           ctx._index = "net-6-" + (ctx._index.substring('net-'.length(), ctx._index.length()));
           // some other stuff
    

P.S. as you can see, I dont really like dancing )


(Zachary Tong) #5

Ah nice, that's a clever use of the reindex script. Filing that away for future reference myself :slight_smile:


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.