Good practice for index templates in ES

Let's say I have documents coming into my ES from different sources. Source A(index name = "abde") has the following mapping:

{ 
                "field1": {
                    "type": "text"
                },
                "field2" : {
                    "type": "text"
                },
                "field3" : {
                    "ignore_above": 256, 
                    "type": "keyword"
                },
                "field4" : {
                    "ignore_above": 256, 
                    "type": "keyword"
                }
}

Source B (index name = "abcd" )has the following mapping:

{ 
                "field1": {
                    "type": "text"
                },
                "field2" : {
                    "type": "text"
                },
                "field5" : {
                    "ignore_above": 256, 
                    "type": "keyword"
                },
                "field4" : {
                    "ignore_above": 256, 
                    "type": "keyword"
                }
}

So is it advisable to have separate index templates(one for abcd* and one for abde*) for them or a single one (just one template for ab*) consisting of union of all the fields. (I also need to keep in mind that there are some common fields in the different sources, which should be searchable across the different indices).
Would the searches be affected if I use either of the two approaches?
Also the example I gave contains just 4 fields. The actual scenario has around 100 fields.
Any help would be appreciated.

This is a classical it depends answer. For me it's all about the data being similar or not. If it is similar right now and the fields don't differ that much, start with one template. If you see it diverging over time, go ahead and split it. Also you can have multiple templates matching, so one index template for abc* which contains the common fields and specialized index templates for the concrete indices. I'd just advise in general to not go overboard with this, as there will be always someone coming after you who needs to maintain that. :slight_smile:

--Alex

@spinscale Thanks for the quick response. Actually for my scenario, the log sources can be totally different (one might be linux logs and other can be windows) but the thing is there are some fields which are common to both these types of logs like username, hostname, IP etc (and I would run queries based on these filters across logs sources). So I wanted to know if there would be any performance impact in the two scenarios. If it does not impact, then I would prefer to choose a single template.

The general rule of thumb should be to keep each mapping at its minimum with regards to the number of fields. If we are talking about a few additional fields, that is probably fine, but you never know over time how things diverge.

@spinscale Thanks for the help.

Remember that you can order index templates, allowing you to create a general low-order template to be used for all your indices and then higher order templates with more specialized mappings.

With your two different indices you could create three index templates with slightly different mappings - one general with all the common fields and two smaller templates with the fields unique to each index. For instance like this (assuming you're on ES7):

PUT /_template/ab_general
{
    "index_patterns" : ["ab*"],
    "order" : 0,
    "mappings" : {
         "field1": {
               "type": "text"
         },
         "field2" : {
               "type": "text"
         },
         "field4" : {
               "ignore_above": 256, 
               "type": "keyword"
         }
    }
}

This order 0 template will be applied to all new indices created with a name starting with "ab".

To specify the unique fields, simply create two order 1 templates with the desired mappings:

PUT /_template/abde
{
    "index_patterns" : ["abde*"],
    "order" : 1,
    "mappings" : {
         "field3" : {
               "ignore_above": 256, 
               "type": "keyword"
         }
    }
}

and

PUT /_template/abcd
{
    "index_patterns" : ["abcd*"],
    "order" : 1,
    "mappings" : {
         "field5" : {
               "ignore_above": 256, 
               "type": "keyword"
         }
    }
}

Since the index_patterns are different the last two can't both be applied to the same index, either the first one will match, when you create the "abde" index, or the latter will when you create "abcd". In both cases the order 0 template will also match and be applied first (as an order 0 template should) giving you the desired mapping for each of your indices.

The index template mechanism is very useful for "inheriting" fields or overriding defaults with more specialized mappings. I commonly use 3-4 templates in my clusters with all the general mappings in the order 0 template and then increasingly more specialized mappings in higher order templates. This makes it easy for me to maintain all my mappings and to extend them to support new types of indices.

@Bernt_Rostad Thanks for such a wonderful explanation. I think this helps for my use case.
Just another thing, speaking theoretically, should separate templates (in my case, separate order 1 templates) perform better in searching than a single big template?

The order 0 and 1 index templates will be blended into one mapping when a new index is created, so it doesn't really matter if you use several smaller or just one large index template. The end result will be the same so search speed will not be affected by the number templates used.

There are two reasons I prefer having several smaller rather than one large index template:

  1. It's easier to reuse smaller mappings for several types of indices in a cluster.
  2. It's easier to maintain mappings if a given field is defined in only one index template.

Thanks. Got it.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.