Elasticsearch Template Issue


(Madhukar) #1

I am creating the index template and define mapping in it.

Default Elasticsearch Settings
Sent data of size (on disk) 737 MB (173,170 Documents) to Elasticsearch, its size on elasticsearch is 320.1mb (default elasticsearch index setting)

Template Settings
Then I created a template and sent same data again.

PUT /_template/example-name
{
  "template" : "example-name*",
    "index_patterns" : [
      "example-pattern*"
    ],
    "settings" : {
      "index" : {
        "number_of_shards" : "5",
        "number_of_replicas" : "0",
        "refresh_interval" : "5s"
      }
    } 

Now the size is increased.
Size = 956.2mb
Documents = 173,170

Used Index Compression Settings
So i found an index setting for compression here https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html#_static_index_settings
i.e. index.codec : best compression

Used this settings:

    "settings" : {
      "index" : {
        "number_of_shards" : "5",
        "number_of_replicas" : "0",
        "refresh_interval" : "5s",
        "codec" : "best_compression"
      }

And now, size is 843.8mb.

Could you please suggest how to do the index optimization and make sure the size reduces?


(David Pilato) #2

Do you have the exact same scenario and data when you are doing your tests?
To do a real comparison you should call at the end the _forcemerge API and merge to one single segment.

My 2 cents


(Madhukar) #3

Thanks for your response David.
Yes, I have exact same scenario and data for this test.
How do i use the _forcemerge API in the end while using template :

PUT /_template/example-name/_forcemerge
{
  "template" : "example-name*",
    "index_patterns" : [
      "example-pattern*"
    ],
    "settings" : {
      "index" : {
        "number_of_shards" : "5",
        "number_of_replicas" : "0",
        "refresh_interval" : "5s",
        "codec" : "best_compression"
      }
    },

I used this, but got illegal_argument_exception.


(Christian Dahlqvist) #4

Compression typically improves with shard size, so to make fair comparison I would recommend you to index into an index with a single primary shard. As the size on disk will fluctuate as segments are merged in the background, it is important to force merge down to a single segment once indexing has completed as the indices otherwise could be in varying stages of merging.


(David Pilato) #5

I meant to call manually this API https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-forcemerge.html after you have injected all data. Then look at the size.

As @Christian_Dahlqvist said, one single shard would be even better.


(Madhukar) #6

Thanks David, But that's not what we would want because if we use this manually, then we would need to do this for every index.

The scenario here is that i want to create a template for a particular index pattern so that if any index that follows that particular pattern would be created automatically using the template settings. And I want dynamic mapping in it.


(Madhukar) #7

Thanks @Christian_Dahlqvist said, reducing to 1 shard helps to reduce size to some extent but not much useful.

See the screenshot. It is still more than the original data ingested to Elasticsearch i.e. 173170 documents.

image


(Christian Dahlqvist) #8

I would recommend you have a look at the following resources and try optimise your mappings if you have not already done so:

https://www.elastic.co/guide/en/elasticsearch/reference/6.5/tune-for-disk-usage.html


(Madhukar) #9

Thanks @Christian_Dahlqvist for your response.

We did the same thing today and it is working perfectly fine now. :slight_smile:

Below is the screenshot with same number of documents and look at the size now.

ELK_1

Explanation:-

So I created a template with below settings:

  • Removed all my static fields that had type "keyword" from properties section of mapping.
  • Included only those static fields that i assigned a custom type like date, long, ip etc.
  • Any field except that would also be created as type "keyword". Because it is dynamic template.

1 shards
0 replicas
codec => best compression

  • Below is the index setting:

ELK_2

  • Below is the dynamic template setting:

ELK_3